Subscriber access provided by Illinois Institute of Technology
Review
A review on property estimation methods and computational schemes for rational solvent design: A focus on pharmaceuticals Harini Madakashira, Jhumpa Adhikari, and K. Yamuna Rani Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/ie301329y • Publication Date (Web): 26 Apr 2013 Downloaded from http://pubs.acs.org on April 27, 2013
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
A review on property estimation methods and computational schemes for rational solvent design: A focus on pharmaceuticals †
†
M. Harini , Jhumpa Adhikari , K.Yamuna Rani* †
*
Department of Chemical Engineering, Indian Institute of Technology Bombay, Mumbai-400076 Chemical and Energy Engineering Division, Indian Institute of Chemical Technology, Hyderabad-500607
Abstract This paper provides a review of the available literature on computational schemes for rational solvent design, with a focus on solvent extraction and crystallization (the two most common unit operations) in pharmaceutical industry. The computer aided design of solvents is important as a cost effective tool, especially, with the regular development of new pharmaceutical molecules. Also, there is a need to minimize the amount and the number of solvents used with regard to environmental, health and toxicological concerns. This review covers the properties of interest and the predictive methods for estimation of these properties in solvent design including the group contribution based methods, quantitative structure property prediction methods and molecular modeling methods. Additionally, the various optimization approaches for rational solvent design such as outer approximation, branch and bound, simulated annealing and genetic algorithm are also discussed. Keywords: product design, property prediction, optimization, solvent, pharmaceuticals, crystallization, extraction.
*Author to whom correspondence should be addressed; electronic mail:
[email protected];
[email protected] ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Contents Introduction………………………………………………………………………………. . 1 Property Estimation Methods ……………………………………………………………... 6 Group contribution based methods………………………………………………… 6 QSPR methods…………………………………………………………………….. .10 Utility of various topological indices………………………………………. 13 Molecular Modeling and Simulation……………………………………………......16 Properties of Interest for Solvent Selection……………………………………………….. .18 Solvent Extraction……………………………………………………………….….18 Infinite dilution activity coefficient………………………………………….19 Crystallization……………………………………………………………………… 20 Hydrogen bonding solubility parameter…………………………………….20 n-octanol/water partition coefficient….…………………………………… .21 Solubility parameter………………………………………………………....21 Toxicity……………………………………………………………………....22 Rational Solvent Design Approaches …………………………………………………….....23 Constraints……………………………………………………………………...……23 Property Constraints………………………....………………………..…......23 Structural Constraints………………………………..…………………..…..23 Practicality Constraints……………………………………………………...26 Optimization Approaches………………………………………….……………........26 Deterministic Approaches…………………………………...……………….28 Branch and bound.….……………………………………………….28 Outer approximation.………………………………………………. .29 Stochastic Approaches……………………………………………………… 31 Simulated annealing……………………….....………………………31 Genetic algorithm..………………………...…………………………32 Solvent Design for Pharmaceuticals…….……………………………………………………35 Solvent extraction….....…………………..…………………………………….……35 Crystallization………..…….........………...…………………………………….…...36 Summary…………………………………………………………………………………...…40 Acknowledgement…………………………………………………………………………. ...41 References…………………………………………………………………………………. ...42 Appendix…………………………………………………………………………………… ...53
*Author to whom correspondence should be addressed; electronic mail:
[email protected];
[email protected] ACS Paragon Plus Environment
Page 2 of 68
Page 3 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
Industrial & Engineering Chemistry Research
Introduction
2
The design and development of new chemical based products with applications such as
3
drugs, cosmetics, pesticides, food, etc., is important and requires validation. Process engineering
4
and process design have been slowly evolving into a newer area of product engineering and
5
product design with totally different goals from those of the former over the last few years.
6
According to Moggridge and Cussler,1 chemical product design is a procedure defining “what we
7
need, generating ideas to meet this need, screening and selecting the best of the ideas and finally
8
deciding what the product should look like and how it should be manufactured”. An alternate
9
definition given by Hill2 is that product design is a general procedure for structured products
10
consisting of six steps: “consumer need identification, conceptual product design, identification
11
of active ingredient, incorporation of active ingredient into a physical prototype, assessing it
12
against relevant criteria and experimental refinement in prototype based on measured results”.
13
In chemical product design, with knowledge of the desired behavior and properties, we
14
attempt to identify the final product. Occasionally, the product may also be manipulated in order
15
to obtain the desired product behavior. These types of products are commonly known as
16
formulations, where an additive when added to chemical or non-chemical product enhances its
17
properties. Therefore, the problem is to find the appropriate chemical that will exhibit the desired
18
behavior; for example: enhanced profit, increased operational efficiency, positive environmental
19
impact, low toxicity, etc. Since millions of compounds exist, it is difficult to find an appropriate
20
chemical that meets the specific needs, only by direct experimentation. In addition, as
21
experimental measurements are often time consuming and expensive, predictive methods can
22
replace measurements if the estimates are sufficiently good. Until recently, many researchers
23
have proposed various methodologies which have made a significant contribution in reducing the
24
time and cost for the experimental effort.
25
Development of systematic methodologies for the design of chemical based products has
26
been attempted by various researchers.3-8 A multi-step and multi-level approach consisting of
27
problem formulation (pre-design step), compound identification (design step) and result analysis
28
(post-design step) has been proposed for computer aided product design by Harper and Gani
29
along with the description of roles of the different steps and tools needed in each step. Later, 1 ACS Paragon Plus Environment
3
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
Charpentier 4 proposed a ‘3PE’ approach, ‘the triplet molecular product process engineering’ for
2
successful product development of complex, multidisciplinary, non-linear and non-equilibrium
3
phenomena occurring at different length and time scales, in order to understand how physical
4
and bio-chemical phenomena at a smaller length scale relate to properties and behavior at a
5
longer length scale.
6
A computer aided molecular design (CAMD) methodology for the design of optimal
7
solvents and solvent mixtures using sub-problem approach has been illustrated through case
8
studies by Karunanithi et al.5 This methodology makes use of decomposition based solution
9
strategy, where the number of feasible molecules is systematically reduced in subsequent levels
10
by partitioning the constraints. Later, for a period, the focus was on methodology of integrated
11
product and process design where, Smith and Ierapepritou 6 have presented a review on the need
12
of integrative product design in perspective to the current challenges in the chemical process
13
industry. Bommareddy et al.7 have presented an algorithm based on algebraic approach for
14
simultaneous solution of product and process design problems. In this approach, the primary
15
problem identifies property targets corresponding to the desired process performance and the
16
secondary problem discovers the molecular structures that match the property targets identified.
17
Recently, a systematic methodology with an integrated three stage approach has been proposed
18
for product design and verification of liquid formulations; such that, stage-1 generates a list of
19
feasible product candidates, stage-2 deals with planning and execution of experiments and stage-
20
3 involves product validation.8
21
There is a wealth of literature available on physical and chemical property prediction of
22
compounds based on their structure. Target compounds reported in literature include solvents,9
23
refrigerants,
24
with a specific target property and its target value. The target property and its range vary with the
25
size and complexity of the chemical product. For example, the target property for solvent design
26
involving relatively small molecules relate to macroscopic scales whereas drug design involving
27
large molecules, relate to microscopic and mesoscopic scale. Gani13 has illustrated a
28
methodology to solve chemical product design problems using computer aided methods and
29
tools, and proposed an approach to predict a wide range of physical and thermodynamic
30
properties with the help of molecular description of targeted compounds. Further, the need of
31
thermodynamic modeling towards chemical product design for complex chemical products such
10
polymers
11
and ionic liquids.12 Each class of targeted compounds is associated
2 ACS Paragon Plus Environment
Page 4 of 68
Page 5 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
as detergents, paints and polymers has been outlined.14 Recently, O’Connell et al.15 have
2
provided a comprehensive review and perspective on thermodynamic property modeling which
3
can be applied to product engineering.
4
Solvents have been widely used for centuries in various industries such as chemicals,
5
petrochemicals, pharmaceuticals, leather, cosmetics, food and beverages etc., to carry out
6
reaction, formulation, separation and cleaning of equipment. In process industries, solvents are
7
used in various steps such as separation (of gas, liquid and/or solid), reaction (as reaction
8
medium, reactant, and carrier) and washing. Apart from these, solvents are used in paints,
9
textiles, rubber, adhesives, cleaning reagents (dry cleaning, washing), etc., for product
10
formulations.
11
The pharmaceutical industry is one of the largest users of organic solvents per unit of the
12
final product. Solvents constitute 56% of the mass in the manufacture of an active
13
pharmaceutical ingredient (API).16 The presence of a solvent can amend a wide variety of
14
important factors on a reaction, such as controlling the reaction temperature, altering the
15
chemical kinetics and even affecting chemical equilibrium. Pharmaceutical molecules, because
16
of their high polarizability, conformational flexibility and existence of multiple functional
17
groups, are very different from the common petrochemicals and therefore require special
18
attention. It is essential to produce pharmaceutical products of high purity, consistent quality and
19
high yield. To meet these demands, solvent selection plays a vital role but is generally least
20
considered by pharmaceutical chemists. An article by Nicponski and Ramachandran17 on the role
21
of solvent selection at different stages in pharmaceutical industry discusses the factors that are to
22
be considered in solvent selection such as cost, ease of recoverability, recyclability, inherent
23
reactivity, environmental effects, etc., and their disposal effects in the current era; thus affirming
24
the need for greener solvent design.
25
Recently, Henderson et al.16 have highlighted the issues that one encounters in solvent
26
selection and reported a database of solvents used in pharmaceuticals along with their properties
27
such as toxicity, vapor pressure, boiling point, melting point, flammability, environmental
28
impact, waste and recycling ability. A review on organic solvents used in drug research by
29
Grodowska and Parczewski18 reports the division of various classes of solvents based on their
30
toxicity and environmental hazard. The article also discusses the problems encountered while 3 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
using organic solvents in each step (reaction, formulation, separation) of synthesis of the API.
2
The different methods employed for the removal of undesirable and toxic residual solvents are
3
discussed in detail so as to maintain their concentrations within acceptable limits. They have also
4
discussed ways to avoid organic solvents by choosing new alternatives such as supercritical
5
fluids and ionic liquids. In the last few years, research on the solvent design for pharmaceuticals
6
has being carried out by various researchers.19-23 Some have investigated the therapeutic effect of
7
the drug.22,23 Others have explored the physical property aspects and controlling the
8
manufacturing process of the drug (i.e. improving the yield).18-21
9
There is a need to identify greener solvents with similar or enhanced performance when
10
compared to the existing solvents in chemical and pharmaceutical industries. In this work, a
11
review of the literature on the methodologies available for the prediction of product (solvent)
12
properties from the molecular structure along with the various optimization approaches to solve
13
the solvent design problem is presented. In the literature on molecular design, the property of
14
interest of the molecule is estimated with good accuracy by three methods; namely, Group
15
contribution (GC), Quantitative structure property prediction (QSPR) and molecular simulations.
16
The advantages and disadvantages of these methods have been discussed. Among the methods,
17
the GC based Marrero-Gani (MG) method is widely used by various researchers for property
18
estimation. Both deterministic and stochastic optimization approaches have been widely
19
employed with slight modifications for the attainment global solution. The structure and property
20
constraints employed to formulate the optimization problem are discussed. Further, the
21
properties of interest that are considered in the literature for solvent selection are discussed in
22
detail. Finally, literature available on the application of the solvent design methods to
23
pharmaceuticals has been reviewed. The last section summarizes the literature reviewed in this
24
paper. A roadmap representing the outline of this article is shown in Figure 1.
25 26 27 28 29 30 4 ACS Paragon Plus Environment
Page 6 of 68
Page 7 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Reverse design 1
Need
Property
Structure
2 3
Refrigerants, Polymers, Solvents . . .
CAMD
List of feasible molecules
4 5 6 7
Property Estimation Methods
Structural feasibility constraints
Selection of basic groups
8
UNIFAC groups
UNIFAC groups based
9
Marrero-Gani groups
Octet rule based
10 11
Joback GC Constantinou Gani Marrero-Gani Position GC
12
GIC
13 The selection of
QSPR
basic groups 14 depend on the property prediction 15 method used
To generate structurally feasible aliphatic, aromatic, acyclic molecules
Based on the need and the problem definition properties are chosen with bounds on them
GC-CI MG-CI Signature descriptors
GC based methods do not give information about the connectivity of the molecule
18 19
Branch and bound Interval analysis Stochastic methods Simulated annealing Genetic algorithm Tabu Search
CI
16 17
Optimization Deterministic methods Outer approximation
GC based methods
Adjacency matrix based
Base groups for CI
Property constraints
Minimize/ maximize objective function with imposed structural and property constraints
Molecular simulation
20 Topological Indices 21 Worked out example for
22 melting point estimation using structure property 23 predictions
24
Connectivity index Volumetric connectivity index Mass connectivity index Electronegativity index Bond based linear indices
25 26
Lu Index
Figure 1. Flowchart / Roadmap of CAMD framework 5 ACS Paragon Plus Environment
Case studies on solvent extraction and crystallization using CAMD
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
Property Estimation Methods
2
Based on the definition of chemical product design, it can be described as “reverse
3
property prediction” since the molecular structure is predicted for the desired property. The
4
problem type can be molecular and mixture design,24 process design synthesis and evaluation,25
5
process and product design,26 process solvent design,27 etc. Once the problem is formulated, the
6
inputs and the constraints (structure and property) are set based on the process objectives. New
7
molecules are designed and then followed by a check, whether the molecule obtained can be
8
synthesized, the availability of raw materials and their environmental impact; and validation of
9
the desired property of interest. In chemical product design, the molecular design problem is
10
transformed into a CAMD problem, which incorporates optimization techniques along with
11
molecular structure-property relationships. A book on molecular systems engineering edited by
12
Adjiman and Galindo,28 presents in brief, the structure-property correlations and optimization
13
based approaches to CAMD along with solvent design for reactions.
14
In literature, various structure property prediction methods are reported for the design of
15
new molecules. They can be broadly categorized as GC based methods; QSPR based methods
16
and their combinations, and molecular simulations. Different GC based methods, namely general
17
GC method, Constantinou-Gani method, group interaction contribution (GIC) method, MG
18
method and position group contribution method are discussed first. It is then followed by the
19
discussion on QSPR methods: the connectivity indices (CI) method and the molecular signature
20
descriptors method with a focus on various topological indices that are reported in the literature
21
and are of interest in context to this article. Later, the methods based on the combined approach:
22
the GC-CI method, MG-CI methods are also described. The flowsheet of Figure 1 lists out
23
various structure property prediction methods that are discussed in this article.
24
Group contribution based methods
25
The GC method is a well established technique in literature, developed by Lydersen in
26
1955, to estimate pure component critical properties from molecular structure. Since an infinite
27
number of chemical compounds exist but only a limited number of functional groups, it was
28
convenient to estimate functional group parameters from existing data, and then predict the
29
properties of new compounds. Deal and Derr29 were the first to review the use of molecular
30
structure for making quantitative estimates of activities in mixtures of simple organic 6 ACS Paragon Plus Environment
Page 8 of 68
Page 9 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
compounds. The estimates were found by interpolating and extrapolating from one system to
2
another system using the idea of characteristic structural group contributions. Research on GC
3
based formulations aided in overcoming the dearth of experimental data, and imparted fast and
4
intuitive tools for the prediction of pure component properties.30 The simplest form of GC is the
5
determination of the physical property by summing up the product of the contributions made by
6
structural groups in the individual molecule and the number of times each group appears, (i.e.
7
assuming linear additive dependence) as shown in the appendix of this article. This simple
8
approach was first used in the Joback method for the prediction of thermo-physical and
9
transport properties such as critical state data, heat capacity, viscosity, etc., and it worked well
10
for a ‘limited range’ of components.31 The first order GCs have been developed with 40 groups
11
for organic compounds containing halogens, oxygen, nitrogen and sulfur. The universal
12
functional activity coefficient (UNIFAC) method developed by Fredenslund et al.32 use the GC
13
method to estimate activity coefficients, which along with other properties, such as vapor
14
pressures in the modified Raoult’s law, are used to predict vapor-liquid equilibrium in mixtures.
15
However, this method predicts the same result for all isomers and does not differentiate among
16
these molecules. Moreover, reliable GC models are available only for a limited number of
17
thermodynamic properties and it is not possible to represent all atomic arrangements. Thus, it is
18
difficult to accurately predict the properties of complex molecules (eg. heterocyclics). To address
19
a few of these disadvantages of GC, a general methodology for CAMD using GC approach,
20
which can handle molecules of various degrees of complexity and size has been proposed by
21
Constantinou et al.33 This approach categorizes groups into the first and second order, where the
22
second order groups have the first order groups as building blocks and is represented as follows: 23
f ( p) =
∑NC i
i
i
+W ∑ M jDj
(1)
j
24
In equation l, Ci and Dj are the first order and second order GCs, respectively, and Ni and Mj
25
correspond to the number of first order and second order occurrences in the compound. The
26
constant W is set equal to unity if the second order term is to be used. This two level approach
27
consists of 63 first order groups and 40 second order groups. Five case studies,33 where this
28
approach has been successfully applied include solvent design for liquid - liquid extraction and
29
azeotropic distillation, design of polymers, solute design for dehydration by super critical
30
extraction, and design of low cost solvent blends for coatings. Although, the property estimates
31
of a class of molecules were improved after addition of second order groups, this method also 7 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 68
1
suffers from the same limitations as observed in GC. An example of melting point estimate using
2
this method is shown in the Appendix along with the average absolute error (AAE) values. The
3
AAE is defined as:
4
∑θ AAE =
est i
− θ iexp t
(2)
N
5
where θiest is the property θ estimated by regression θiexpt is the experimental value of the
6
property θ for compound I and N is the number of data points.
7
Numerous attempts have been made to overcome the limitations of GC methods. Eladio
8
and Raman developed a model applicable to mixtures, where the property is determined by GIC,
9
which considers the contributions of different interactions between bonding groups present in a
10
given molecule instead of contribution of structural groups.24, 34 The models based on GIC were
11
observed to give a better estimate when compared with GC and can be used to distinguish
12
between isomers in mixtures but require a large number of model parameters. Their results also
13
show that the estimate of activity coefficients obtained by GIC are very close to those obtained
14
by UNIFAC method, and can be used for isomers. The accuracy of the prediction depends on the
15
correctness of the determination of model parameters.
16
Subsequently, a modified GC based estimation for pure organic compounds, named the
17
MG method, has been developed to increase the accuracy of the prediction and applicability by
18
accounting more complex heterocyclic and large polyfunctional alicyclic compounds.35 A data
19
set of more than 2000 compounds ranging from 3 to 60 carbon atoms, including large and
20
complex polycyclic compounds have been used to develop the correlations. The property
21
estimation model has the form of the following equation:
f ( X ) = ∑ NiCi + w∑ M j D j + z ∑ O22 k Ek i
j
(3)
k
23 24
where, Ci, Dj, Ek are the contributions of the first, second and third order group of type i, j and k
25
that occur Ni , Mj, Ok times respectively. In the first level of estimation, the constants w and z are
26
assigned zero values because only first order groups are employed. In the second level, the
27
constants w and z are assigned unity and zero values respectively, because only first and second
28
order groups are involved while in the third level, both w and z are set to unity values. The left-
8 ACS Paragon Plus Environment
Page 11 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
hand side of equation 3 is a simple function f(X) of the target property X (such as exponential
2
function of the ratio of property value to the adjustable parameter or just the difference of the
3
property value and the adjustable parameter). Three levels of molecular groups have been
4
identified which are termed as first order, second order and the third order groups. The first level
5
corresponds to simple and mono-functional compounds; the second level to polyfunctional
6
compounds, aromatic and aliphatic compounds with one ring; third level with large complex
7
polycyclic compounds. The proposed method was found to estimate properties with increased
8
accuracy, low AAE and wide range of applicability for chemical, biochemical and environmental
9
compounds. Their results also indicate that for smaller, complex molecules, it is better to have a
10
smaller set of second order groups, and for large polyfunctional compounds, larger third order
11
groups give better results. Marrero and Gani35 have reported the contributions for 182 first order
12
groups, 122 second order groups and 66 third order groups for the following properties: normal
13
boiling point, critical temperature, critical pressure, critical volume, standard enthalpy of
14
formation, standard enthalpy of vaporization, standard Gibbs energy, normal melting point and
15
standard enthalpy of fusion. The determination of the representative example property, melting
16
point, using this method is reported in the Appendix.
17
Recently, a new GC method named the ‘position group contribution method’ has been
18
proposed for estimation of critical properties, boiling point and melting point for organic
19
compounds.36 This method distinguishes isomers including cis and trans structures and takes into
20
account the ortho, meta, para corrections in benzene ring and pyridines which were not taken
21
into account by MG and the other methods discussed above. A total of 730 compounds
22
containing carbon, hydrogen, oxygen, nitrogen, chlorine, bromine and sulphur were used for the
23
determination of group contributions for melting point. An example calculation has been shown
24
in Appendix. This method was found to perform better and showed less deviation in prediction
25
when compared with Joback and Constantinou-Gani method. Table 1 presents the AAE for
26
various properties when the above discussed structure property prediction methods were used.
27
The expression for the GC is as follows:
28
N + a exp(1 N ) f ( x ) = ∑ Ai N i + ∑ A j tanh j + ∑ Ak Pk + a1 exp 1 2 N M w i j k
(4)
29
where Ai or Aj represents i or j group contributions, determined through minimization of residual
30
error by regression. Ni represents the number of groups in which carbon element forms the center 9 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 68
1
of the group, Nj represents the number of groups in which non-carbon element forms the center,
2
N is the total number of groups N = ∑ N i + ∑ N j , Pk characterizes the position factor, which i j
3
accounts for ortho, meta, para positions, Mw denotes the molecular weight, and a1 and a2 are the
4
coefficients of correlation and f(X) is the simple function of target property X.
5
QSPR methods
6
The QSPR approach for property prediction represents the effect on the properties due to
7
interactions among different molecular groups based on their connectivities. QSPR is a technique
8
to quantitatively correlate chemical structure and properties based on the molecular structure.
9
The QSPR approach is widely used for the prediction of physico-chemical properties.37 This
10
approach is based on the assumption that the variations in the properties of the compounds can
11
be correlated with changes in their molecular features, characterized by the so-called “molecular
12
descriptors”. In this method, molecular structure is characterized by a number of topological,
13
geometrical and quantum chemical descriptors which are used to estimate the property of interest
14
by multi-linear regression. Among various descriptors, the topological indices (TIs) are widely
15
used descriptors, since these indices offer a simple way of measuring molecular branching,
16
shape, size, cyclicity, symmetry, centricity and complexity. The molecular CIs are the most
17
commonly used topological descriptors that provide quantitative characterization of skeletal
18
variation in a molecule. These descriptors are based on substructure features in the molecular
19
graph, such as bonds, clusters and rings.
20
Graph theory is a branch of mathematics that deals with the objects that are connected.
21
The objects in the graph are called vertices and the lines used to connect them are called edges.
22
This analogy in the chemical system is as follows: the sites are represented by atoms, molecules
23
and molecular groups, and the connections between those sites are bonds and interactions. For
24
simplicity, molecular graphs are generally represented as hydrogen suppressed graphs.38 The
25
representation of a molecule in the form of graph is the first step in the development of any
26
topological index. The central problem in QSPR is to convert chemical structures into molecular
27
descriptors that are relevant to a certain physico-chemical property. Many physico-chemical
28
properties can be satisfactorily correlated with the topostructural or topochemical features. In
29
principle, these are mathematical objects, without an accurate physical meaning. Topological 10 ACS Paragon Plus Environment
Page 13 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
descriptors are calculated using information based on the connectivity of atoms/groups within a
2
molecule. Consequently, these descriptors contain information about the constitution, size, shape
3
and branching, whereas bond length, bond angles and torsion angles are neglected.
4
In 1975, Milan Randic proposed an algorithm to characterize bond contributions to a
5
molecular branching index. A molecular CI is calculated by counting all bonded atoms other
6
than hydrogen in the molecular structure and designating a “δ” value (cardinal number) for each
7
atom. The “δ” value of an atom equals the number of adjacent non-hydrogen atoms. The “δ”
8
values of each atom forming a bond pair designates a bond value, and the bond values are then
9
summed over all the bonds (single, double, triple) in the chemical structure to calculate pχ, the pth
10
order Randic index39 as shown in equation 5. p
11
χ=
∑
(δ iδ j ) −1/2
(5)
edges _ ij
12
Where, i and j are adjacent atoms forming a bond pair in the structure, and δi and δj are the atom
13
connectivities of the molecular graph.
14
The Randic index has later been modified by Kier and Hall40 by decomposition of graphs
15
into subgraphs, which may consist of a single atom, a single edge or a set of connected edges
16
(path) in which no vertex is included twice. It is then summed over the subgraphs. Subsequently,
17
the valence CI is obtained by summing over all type ‘t’ subgraphs with ‘m’ edges using the
18
valence δ (δv) values, which take into account the presence of multiple bonds and heteroatoms.
19
The expression for valence connectivity index is as follows: N m +1
m
20
χ = ∑∑ (δ kv ) −1/2 v t
i =1 k =1
(6)
21
where
22
δ kv =
(Z kv − H k ) ( Z k − Z kv − 1)
(7)
11 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 68
1
In equation 7, Zk ≡ the total number of electrons in the kth atom, Zkv≡ the number of valence
2
electrons in the kth atom, Hk ≡ the number of hydrogen atoms directly attached to the kth non-
3
hydrogen atom, m = 0 for atomic valence connectivity indices, m = 1 for one bond path valence
4
connectivity indices, m = 2 for two bond fragment valence connectivity indices, and m = 3 for
5
three contiguous bond fragment valence connectivity indices.
6
These indices revealed that hydrogen atom count is actually included in the calculation of
7
indices developed from the graph. Also, this realization gave additional support to use the term
8
hydrogen-implied rather than hydrogen-suppressed graph. The general expression for property
9
prediction using these connectivity indices is as follows:
10
P=
1 0 0 C χ + 0C v 0 χ v + 1C1 χ + 1C v1χ v + constant ) ( n
(8)
11
where P is a property of interest, C represents the regression coefficient for each term, 0χ and 1χ
12
are the zeroth and first order molecular connectivity indices, and n represents the total number of
13
groups present in the molecule. The structure–property correlations developed using CIs are
14
more vigorous than those based on GC method. The contributions made by each individual
15
group in a molecule to estimate the property in GC method is comparable with the two zeroth
16
order molecular connectivity indices (0χ , 0χv), which depend on the identities of the various
17
basic groups present in the molecule. However, the advantages of using the CI over the GC
18
method is seen when the first order CIs, (1χ , 1χv), which depend on the bonding characteristics of
19
various basic groups in the molecule, are employed. In other words, the first order CIs can
20
judiciously distinguish the isomers than the zeroth order CIs. Furthermore, when CAMD is
21
performed using this method, a complete molecule structure is obtained, rather than a list of
22
groups that are obtained when GC is used, which need to be further arranged in the optimization
23
step. Though this method offers advantages over the GC method, the main limitation of this
24
method is the knowledge of the correlation coefficients, which are regressed from a database of
25
compounds. Also, several other indices apart from CIs reported in the literature, which are based
26
on various factors such as mass, electronegativity, volume, dipole moment, etc., of each group,
27
are used in accurate property estimation and are discussed below. However, it is to be noted that
28
many such relationships can be very specific to certain classes of molecules, which typically
29
limits their application (Table 2). Additionally, the selection of a set of indices in estimation of a 12 ACS Paragon Plus Environment
Page 15 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
property is again an area subject to research. Again representative calculations for estimating
2
melting point using this CI method are given in the Appendix.
3
Hu et al.41 have reported a detailed review on twenty six TIs, nine different matrices
4
expressing molecular structure and eight methods to deal with different molecular graphs. They
5
have also proposed one new ‘variable index’ for the molecules containing heteroatoms. In
6
attempts to predict physical and thermodynamic properties of polymers a more universally
7
applicable QSPR correlation was developed by Bicerano42 using experimental data of more than
8
400 polymers and involving various descriptors. In general, if two or more chemical graphs have
9
same TI, then the TI is said to be degenerate. Wiener index was the first TI based on topological
10
distance matrix. Balaban43 based his ideas on the first, second and third generation TIs which
11
transmitted information on properties but not on structures. Among the first generation (e.g.
12
Weiner, Hosoya, Centric index) TIs, the Wiener index has high degeneracy. Balaban observed
13
that for the inverse problem such indices would lead to combinatorial explosion of solutions. In
14
comparison, the second (e.g. Randic index, Kier-Hall index, Balaban’s J index) and third
15
generation indices (e.g. triplet indices, BCUT indices) have low degeneracy and are uniquely
16
associated with chemical structures but it is difficult to solve the inverse problem in a reasonable
17
amount of time. Recently, Katritzky and co-workers44 reviewed QSPR studies correlating the
18
prediction of physical and chemical properties with chemical structures. The review mainly
19
focuses on structural descriptors derived from chemical structures for the correlation and
20
prediction of various physical and chemical properties. A detailed approach of QSPR with
21
various modeling procedures, both linear and nonlinear such as multi linear regression, principle
22
component regression, artificial neural networks, genetic algorithm, support vector machines
23
etc., were discussed. In addition, estimation methods for various important physical and chemical
24
properties such as boiling point, heat of fusion, viscosities, refractive index, densities,
25
solubilities, rate constants, stability and dissolution constants, glass transition temperatures, etc.,
26
along with the correlation using different approaches were reported. Numerous TIs (around 400)
27
can be found in the literature41 to correlate molecular structure and properties. However, only
28
few of them have found wide applications. The “connectivity index” is the most widely used TI.
29
Table 2 lists various indices that have been introduced for organic compounds in literature,
30
recently, for a range of properties of interest with low degeneracy which can be utilized to design
31
new molecules. 13 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
Page 16 of 68
Utility of various topological indices:
2
It has been identified that different physical properties depend in a distinct way on the
3
inherent structural features of a molecule. There are the many factors that impact the prediction
4
of a physical property of a molecule; the most apparent among them are the molecular size,
5
shape, polarity, electronegativity and hydrogen bonding. Tetko et al. have developed multi-
6
platform software45, called virtual computational chemistry laboratory involving an indices
7
generation program that computes more than 1600 molecular descriptors which can be used to
8
evaluate structure property relationships. For properties such as solubility and octanol/water
9
partition coefficient multiple prediction tools are available; however, the uncertainty and
10
variability between predictions can be substantial. A software package, CODESSA PRO
11
(COmprehensive Descriptors for Structure and Statistical Analysis), developed by Katritzky and
12
his co-workers44 which enables the calculation of numerous (~1000) constitutional, topological,
13
geometrical, thermodynamic, semi-empirical, quantum chemical, and electrostatic descriptors
14
solely on the basis of molecular structural information, is widely used in prediction of properties
15
of various class of molecules.
16
With ionic liquids (ILs) emerging as new solvents, two indices have come up in the
17
literature that accurately predict the densities of ILs. The volumetric connectivity index (σ) was
18
used in predicting the density of 142 ILs including imidazolium, pyridinium, pyrrolidinium,
19
piperidinium, quaternary ammonium, and quaternary phosphonium at room temperature. This
20
index, when combined with mass connectivity index (λ), can be used to predict the densities of
21
ILs accurately at different temperatures. The detailed calculation to estimate these indices are
22
reported by Xiong et al.46 The bond-based indices were considered to be statistically significant
23
when compared to other molecular descriptors for the estimation of physical, chemical, and
24
biological properties; such as the boiling point, partition coefficient, antibacterial activity, etc.
25
Several promising results have been achieved in the computational drug discovery with the use
26
of these indices.47
27
The electronegativity descriptor has played a key role in estimation of various properties
28
such as melting point, partial charges, boiling point, molar refraction, etc., with low degeneracy.
29
The Lu index48 (shown in Table 2), which is a modification of the well established Weiner
30
index,49 makes use of both relative electronegativity and relative bond length of vertices to
31
correlate the normal boiling points and molar refractions of aldehydes and ketones. Apart from 14 ACS Paragon Plus Environment
Page 17 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
hydrocarbons it is also applicable to the class of hetero-atom and multiple bond containing
2
organic compounds. Index F50 (shown in Table 2) utilizes electronegativity between the groups
3
to obtain a good QSPR model for hetero-atom containing organic and inorganic molecules. The
4
reciprocal distance matrix of Weiner has been modified to multiple matrices that represent
5
topological properties of vertices, bonds, edges and interaction of vertices in a molecular graph.
6
The models for QSPR were obtained for the properties such as aqueous solubility, and
7
octanol/water partition of benzene halides with use of electronegativity descriptor.51 Thus, it is
8
evident from the above discussion that there exist numerous indices for various classes of
9
molecules to predict different properties. The software packages such as CODESSA-PRO44 and
10
DRAGON45 can assist in the selection of these descriptors based on the class/type of the
11
molecule and the property of interest that leads to a better correlation with experimental data.
12
To overcome the common bottleneck while predicting the properties when group
13
contributions are not available, Gani et al.9 have come up with an approach where the
14
contribution of the missing group has been predicted using zero and first order CI. This method
15
is named as the “combined group contribution-connectivity indices (GC-CI) method” and is
16
found to predict the property much closer to the experimental value when compared with the MG
17
method without any missing group contributions. However, in some cases the addition of
18
missing GC introduces large errors in property prediction as combined GC-CI method does not
19
improve the accuracy of original GC method (as shown in the Appendix). The AAE values of
20
various properties have been reported in Table 1. The uncertainties in using this method for
21
property estimation has been quantified by maximum likelihood estimation of GC and CI
22
methods by Hukkerikar and his coworkers.52
23
Recently, a new method based on QSPRs using molecular signature descriptors to
24
identify potentially new molecules has been proposed by Weis and Visco.53 It was found to be a
25
powerful tool for encoding the local neighborhood of a molecule; where, a user-specified
26
parameter called the signature height, ‘h’ determines the size of the local neighborhood. A
27
subgraph centered at a specific root atom including all atoms/bonds extending out to the
28
predefined height, h, without tracking the path backwards is defined as a atomic signature of
29
height ‘h’. It is specified as hσ G ( x ) for the root atom x of the 2D graph G = (V , E ) ; where, V, E
30
refers to the vertex (atom) set and edge (bond) set, respectively. The developed algorithm has the
31
ability to unite a variety of property estimation models based on GC and TIs based QSPRs to 15 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 68
1
trace different property targets in molecular design. Additionally, the method can be employed
2
for TIs of different signatures heights as well. The applicability of this method has been
3
demonstrated with a case study on the design of alkyl substituent for the fungicide by
4
Chemmangattuvalappil et al.,54 where affinity, mobility and retention properties were correlated
5
using first order CIs and toxicity using GC. The accuracy of the predictions using this method
6
will depend on the correctness of the parameters of GC and QSPR methods. In addition, it
7
depends on the range of training set data as it would severely restrict the diversity of inverse
8
structures obtained if the test set does not fall within the bounds of training set. Some of the
9
structure-property prediction methods have been illustrated in the Appendix. The property
10
chosen to illustrate the methods is the melting point and the example compound is 2,5-dimethyl
11
benzoic acid.
12
Recently, Katrizky et al.44 in his review on the utility of structure property correlations,
13
summarized the QSPR models developed for various properties such as boiling point, critical
14
pressure, heat of vaporization, heat of formation, aqueous solubility, partition coefficient and
15
flash point based on type of compound (various classes of organic compounds), number of
16
components, molecular descriptor used, correlation coefficient and standard deviation. In this
17
article, we have summarized the AAE, correlation coefficient, number of components for each
18
property in using various property estimation methods as shown in Table 1. Among various
19
property prediction methods, we recommend the Marrero-Gani method for use; as this method
20
can be easily implemented to a wide variety of molecules, to predict various properties and at the
21
same time offers better accuracy.
22
Molecular Modeling and Simulation
23
Molecular simulations are the computer experiments that allow us to predict macroscopic
24
properties by studying the behavior of a large number of particles.55, 56 Molecular simulations
25
provide insight into the interactions and local structures of the molecules. These simulations start
26
with the consideration of microscopic structure and molecular interactions of the system to
27
derive thermodynamic, transport or other properties based on principles of statistical mechanics.
28
In general the molecular simulations are widely used to predict the properties of materials;
29
understand underlying molecular aspects of phenomena, which may lead to new experiments and
30
development of new theories; and test approximated theories for the system of interest. There are
31
two approaches in performing these simulations: stochastic and deterministic. The stochastic 16 ACS Paragon Plus Environment
Page 19 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
approach, called Monte Carlo (MC) is based on random generation of configurations which are
2
associated with a known probability. The deterministic approach, called molecular dynamics
3
(MD), actually simulates the time evolution of the molecular system and provides us with the
4
actual trajectory of the system. Each technique has distinct advantages for certain class of
5
molecules. For a given molecular system, MC methods require, in iterations, less computing time
6
than MD. Compared to qualitative approaches like the correlations of group contributions,
7
molecular modeling is more reliable given an accurate force field to describe inter- and intra-
8
particular interactions.
9 10
Molecular MC and MD simulations are used to generate pseudo-experimental data for
11
wide ranges of pressure and temperature; understand the macroscopic behavior of mixtures that
12
include expensive novel products or toxic compounds, instead of costly experimental
13
investigations. These simulations have been used to predict macroscopic properties in oil and
14
gas, cosmetics, pharmaceutical industries57 and more generally in the chemical industry14 where,
15
MC methods are typically used to equilibrate solute-solvent systems. A broad spectrum of
16
problems in chemical and material science such as vapor–liquid equilibrium,
17
liquid equilibrium, solid-liquid equilibrium,58 supercritical solutions and ionic liquid properties;59
18
crystal growth and crystal orientation;21 determination of activity coefficient;60 transport
19
properties such as viscosity, diffusivity, thermal conductivity55, 56 can be calculated presently by
20
molecular modeling based on information of force fields and the ab initio parameters.
55, 56
vapor liquid–
21
Meniai and Newsham60 were the first to employ molecular modeling along with GC for
22
the evaluation of design parameters in solvent selection for liquid-liquid extraction. The
23
molecular graphics system was used to avoid the unusual combination during the assembly of
24
shortlisted groups (i.e. ensuring intermolecular stability) and in the estimation of unknown
25
UNIQUAC (UNIversal QUAsi Chemical) interaction parameters. The evaluation of properties
26
using this method was significantly complex and limited the size of the possible search space
27
using conventional computational resources. Harper et al.61 have modified this approach and had
28
come up with a CAMD methodology which combined molecular modeling with GC. This
29
method includes a structure generation algorithm, a large collection of property estimation
30
methods and a link to molecular modeling tools. In comparison to CAMD which gives the list of
31
possible candidates that need further investigation; this methodology with the link to molecular 17 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 68
1
modeling delivers the result in the form of molecules in the ready to use form. Stanescu and
2
Achenie62 have proposed a two-step CAMD method in which candidate solvents are generated
3
based on constraints on their physical properties, followed by density functional theory (a
4
quantum mechanical modelling method) solvation calculations to estimate the reaction rate and
5
product yield in the candidate solvents. A multi-scale model-based approach for predicting
6
physical properties of polymer repeat unit by combined CAMD technique based on GC plus
7
method with atomistic simulations has also been put in use.63 The molecular simulations were
8
capable of providing the physical properties of the polymer as a function of size (number of
9
repeat units) and operational variables such as the temperature and the pressure.
10
Properties of Interest for Solvent Selection
11
This section deals with the properties of interest for solvent extraction and crystallization
12
operations, and their estimation based on the methodologies that are described in the above
13
section. The book by Poling et al.30 provides an abundance of thermodynamic and physical
14
properties (such as viscosity, thermal conductivity, surface tension and diffusivity) for the pure
15
component and mixtures of various organic compounds. A detailed list of models developed for
16
the prediction of the properties of interest for solvent selection along with the descriptors
17
considered for diverse organic compounds has been reported in a recent review by Katrizky et
18
al.44 A statistical analysis using eight solvent parameters including hydrogen bond acceptor
19
propensity, hydrogen bond donor propensity, polarity/dipolarity, dipole moment, dielectric
20
constant, viscosity, surface tension and cohesive energy density of 96 pure solvents to separate
21
out the basic groups for the discovery of new polymorphs by crystallization has been carried out
22
by Gu et al.64 Table 1 reports the AAE of the property of interest using various property
23
estimation methods. Table 3 lists out the properties of interest those are relevant to the context of
24
this article.
25
Solvent Extraction
26
The physical properties of interest are, generally, classified into three categories: primary,
27
secondary and mixture properties. For liquid-liquid extraction, the primary properties (such as
28
boiling point, melting point, liquid density, viscosity, critical properties) can be predicted using 18 ACS Paragon Plus Environment
Page 21 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
GC, Constantinou-Gani, MG and CI methods.31, 33, 35, 40 For practical liquid-liquid extraction, the
2
ratio of the densities at the operational temperature must be at least 1.05, the melting point
3
should be lower, and the boiling point should be far above the operating temperature. Secondary
4
properties, (such as vapour pressure and heat of fusion) which are predicted using analytical
5
expressions and are functions of primary properties, can also be determined by GC, MG, CI
6
methods and their combinations. 65-67 Mixture properties such as activity coefficients involved in
7
phase equilibrium are estimated using UNIFAC method. 32, 68
8
Infinite dilution activity coefficient
9
An important mixture property relevant to solvent extraction is the infinite dilution
10
activity coefficient (γ∞). This property measures the interactions between solute and solvent
11
molecules in the absence of solute-solute interactions; and gives an estimate as to how the solvent
12
medium differs from the pure solute. The activity coefficient value also accounts for the non-ideal
13
behaviour of a mixture and can be predicted with UNIFAC/modified UNIFAC using vapour
14
liquid equilibrium / liquid-liquid equilibrium (VLE/LLE) interaction parameters. For
15
pharmaceutical molecules, the activity coefficient can be estimated using regular solution
16
theory,57 UNIFAC, modified UNIFAC and COSMO-RS;
17
solubility using equation 9, which states that the solid solubility (xi) is a function of the activity
18
coefficient (γi) and pure component properties of the solute (the melting temperature Tm and heat
19
of fusion ∆Hf ).
20
ln xi =
69
and in turn can be used in predicting
∆H f Tm 1 − − ln γ i RTm T
(9)
21
where R is the gas constant. The use of constitutive models in the phase equilibria of
22
pharmaceutical product-process design for pure component and mixtures has recently been
23
discussed in a review article by O’Connell et al.15 Various case studies have been discussed with
24
detailed model relations for VLE of oleum, formaldehyde and water; VLE of carbon-dioxide and
25
hydrogen sulphide in aqueous amine solutions; LLE of carboxylic acid in aqueous solution.
26
Further, various thermodynamic models such as perturbed-chain statistical associating fluid
27
theory (PC-SAFT), non random two liquid – segment activity coefficient (NRTL-SAC),
28
conductor like screening model – segment activity coefficient (COSMO-SAC) and UNIFAC-GC
29
can be used to predict γ∞ values. There are, however, many gaps in the UNIFAC parameter tables 19 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 68
1
due to lack of the necessary experimental data. Recently, a GC+ approach for predicting mixture
2
properties by combining the UNIFAC-GC based activity coefficient model with valence CIs
3
called UNIFAC-CI, has been developed.70 The properties that are estimated from the ratio of
4
activity coefficients at infinite dilution in two phases are: solvent loss, must be as low as
5
possible; separation factor, solvent capacity and selectivity, must be as high as possible. Pretel et
6
al.71 have described solvent selection for extraction in detail and listed out the dominant
7
properties of separation problems as the solute distribution coefficient, solvent loss, solvent
8
power, and solvent selectivity.
9
Crystallization
10
The properties of interest for crystallization include the melting point, (where the solid
11
and liquid phases exist in equilibrium); which in turn is used for prediction of solubility,
12
viscosity and heat of fusion. The boiling point specifies the volatility of a compound and is one
13
of the important properties in characterizing the solvent. Other properties such as critical
14
temperature, flash point, enthalpy of vaporization can be estimated based on boiling points.
15
Dielectric constant, which measures the ability of a liquid to solvate a charged molecular species
16
is used to characterize the polarity of organic solvents. Stanescu and Achenie62 found that in
17
solvents with high dielectric constant, the yield is limited by the reversibility of the reaction.
18
Apart from basic physical properties, the complex physical properties involving
19
interaction between solute/solvent and solvent/solvent include: hydrogen bonding interaction
20
parameter, octanol-water partition coefficient, solubility (Hildebrand solubility parameter,
21
Hansen solubility parameter), donor / acceptor numbers, solvatochromic parameters and other
22
environmental related parameters.
23
Hydrogen bonding solubility parameter (δH)
24
This property quantifies the hydrogen bonding ability of a compound. Highly polar
25
solvents such as methanol have high δH values while non-polar solvents like hexane have very
26
low δH values. As the polarity of the solvent molecules increases, the hydrogen bonding
27
tendency between solute molecules decreases, which may affect the stimulation time of
28
nucleation or the polymorphic structure of the solutes. It is evaluated using GC,72 MG and MG-
29
CI52 and is a function of molar volume. 20 ACS Paragon Plus Environment
Page 23 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
Industrial & Engineering Chemistry Research
n-octanol/water partition coefficient (P)
2
This property is the ratio of the concentration of a compound in n-octanol to that in water
3
at equilibrium. The logarithm of this partition coefficient (log P) is used in calculating numerous
4
physical properties such as membrane transport and water solubility. GC, MG, CI, MG-CI and
5
structural analog approaches52, 73-75 are used to estimate log P values. Soskic and Plavsik76 have
6
put forward a modeling approach for predicting log P from molecular CIs by using empirically
7
determined optimized weights for characterization of skeletal atoms in a molecule instead of the
8
valence delta values and found better estimates of this property.
9
Solubility parameter (δ)
10
The usefulness of δ lies in the ease with which relative solubility comparisons can be
11
made. The smaller the difference in δ values, the greater is the solubility of two chemicals with
12
each other. This is, therefore, a comparative approach and not an absolute measurement. The
13
solubility being the prime property for the solvent selection for crystallization, many research
14
articles have been published over last five years on the prediction of solubilities of organic
15
compounds in various solvents using different methods. The Hildebrand solubility parameter and
16
the Hansen solubility parameter estimations are widely used to predict solubility of solutes in
17
solvents.58
18
The prediction of pharmaceutical solubility using two thermodynamic models, NRTL-
19
SAC and COSMO-SAC, which predict solubility from ab initio calculations have been carried
20
out for four compounds namely, lovastatin, simvastatin, rofecoxib and etoricoxib. The NRTL
21
method was found to offer superior performance than COSMO model in rapidly screening
22
solvents for the crystallization process.77 A review by Modaressi et al.78 on various models for
23
prediction of solid solubility in different organic solvents give an insight about developments in
24
the GC approach for the prediction of three Hansen solubility parameters. For the usefulness of
25
solvent selection, various databases available in literature and various methods to determine the
26
solubility were discussed in detail. Among the methods of property estimation GC, CI, UNIFAC-
27
CI, NRTL-SAC and COSMO methods are discussed with regard to the predictions of phase
28
equilibrium. Tsivintzelis et al.58 have modeled the solid liquid equilibrium for pharmaceutical
29
solvent systems accounting for their complex hydrogen bonding behavior and fitted them to
30
Hansen solubility parameter. Their methodology is applied for modeling the solubility of three 21 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 24 of 68
1
pharmaceuticals, namely acetanilide, phenacetin and paracetamol; using the non-random
2
hydrogen bonding equation of state. Their solubility predictions were satisfactory and matched
3
well when compared with COSMO-RS method. Ruether and Sadowski79 applied a
4
thermodynamic model based on PC-SAFT to correlate and predict the solubility of different
5
drugs in pure solvents and solvent mixtures in designing a crystallization process. Using this
6
approach with an input of very few pure component properties, the solubility predictions for
7
various drug intermediates such as paracetamol, ibuprofen, sulfadiazine, p-hydroxyphenylacetic
8
acid and p-aminophenylacetic acid matched well with the experimental results. The solubilities
9
of active pharmaceutical ingredients of aspirin, paracetamol and ibuprofen in various solvents
10
were predicted using the GC methods (UNIFAC, modified UNIFAC (Dortmund)) and a quantum
11
chemical approach COSMO-RS (real solvents) method.69 When compared with the experimental
12
results, it was found that among the three methods, GC modified UNIFAC method provided
13
lowest root mean square deviations for temperature and solubilities, and is able to accurately
14
predict the solvent that shows high solubility, followed by UNIFAC and COSMO-RS.
15
Toxicity
16
Toxicity is the most important property when dealing with pharmaceuticals. As many
17
organic solvents are toxic in nature, it is necessary to ensure that its end effects are considered
18
after the usage. Until recently, GC was the only structure property method that have been widely
19
used to predict the fathead minnow 96-hr, lethal concentration of 50 ppm (LC50) in literature.80
20
However, a short time ago Hukkerikar and his coworkers illustrated the usage of the developed
21
property models based on MG and MG-CI methods, to estimate environment related properties
22
and the uncertainties of the estimated property values through an application example. A total of
23
809 data points were used to estimate LC50.81 Apart from this, other properties related to
24
environment safety and hazard are the flash point, corrosion, reactivity, etc., which are necessary
25
to be considered wherever solvents are used.
26
Table 3 lists the properties of interest for extraction and crystallization operations, and the
27
structure-property relations available to estimate these properties. The above stated properties are
28
of interest in general for the selection of organic (aliphatic or aromatic) solvent. For inorganic
29
solvents and ionic liquids, depending on their nature and interactions, the properties of interest
30
differ even when we look at extraction and crystallization operations.12, 22 ACS Paragon Plus Environment
82
For example, the
Page 25 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
hydrogen bond basicity, hydrogen bond acidity, hydrophobicity etc., are important
2
considerations for an ionic liquid as an alternative solvent.83 Hence, there is a need for better
3
structure property correlations for individual properties of interest and this requires further
4
exploration.
5
Rational Solvent Design Approaches
6
Gani and his co-workers proposed a GC based exhaustive search approach for molecular
7
design.13 Their methodology starts with selecting the structural groups from the basis set (the
8
total number of groups considered for designing a molecule) and generating a large number of
9
combinations of the selected groups. It is then followed by the constraint check.
10
Constraints
11
Three classes of constraints are considered in the literature for solvent design problem,
12
namely, property constraints, structure feasibility constraints and industrial practicability
13
constraints. The first two constraints are most commonly employed.
14
Property Constraints
15
Property constraints are the inequality constraints where, the bounds on the properties of
16
interest are specified with regard to process conditions. The properties of interest and their
17
estimation methods are already discussed in the previous sections and at this point it suffices to
18
state that bounds are imposed on the estimated values of the properties of interest expressed in
19
terms of the selected set of groups.
20 21
Structural Constraints
22
The structural constraints are employed to generate a structurally viable molecule. Two
23
different approaches for structural constraints are used in the literature: one is based on the octet
24
rule which ensures that the molecule as a whole does not have any free attachments and is given
25
by the following expressions,84
26
∑ ∑ (2 − v )U j
i
27
ij
= 2m
(10)
j
∑ ∑ U ij = N max i
(11)
j
23 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 26 of 68
∑ U ij = 1
1
(12)
j
2
where νj is the valence of group j, Uij is the binary variable with i representing the groups in the
3
basis set and j representing the position in the molecule, Nmax represents the maximum number of
4
positions in the molecule and m = −1,0,1 for acyclic, monocyclic and bicyclic groups. Note that
5
this octet rule does not allow aromatic groups and most cyclic compounds, and cannot handle
6
isomers. Eljack et al. have come up with a constraint pertaining to free bond number (FBN),
7
which differentiates the aromatic, acyclic and cyclic groups and is given by,85
Ng n FBN = 2 ∑ n g − 1 + 2 N r ∑ g g g =1 g =1 Ng
8
(13)
9 10
where Nr is the number of rings in the final molecule, ng is the number of groups ‘g’, FBNg is the
11
number of free bonds in each group and Ng is the total number of basis groups chosen for the
12
study.
13
Vaidyanathan et al.86 have developed a set of structural constraints for the UNIFAC
14
groups as base groups (groups chosen to form a molecule); where these groups are classified into
15
four classes and three types. The division of groups into various classes was based on the
16
valency of the group. The type of the group specifies whether the attachment belongs to one or
17
more atoms. For example: type 1 groups are those in which all the attachments belong to a single
18
atom. The following are the structural feasibility constraints employed for UNIFAC groups:
19
1. The sum of number of univalent groups and the trivalent groups is an even number.
20
2. Sum of valencies of all the groups in the molecule is greater than or equal to twice the
21 22 23 24 25 26 27
maximal valency. 3. Sum of valencies of all the groups in the molecule is greater than or equal to twice the total number of groups less 2. 4. Sum of number of ternary groups and twice the number of quaternary groups is greater than or equal to number of univalent groups less 2. 5. The product of number of odd-valent groups less 2 and number of divalent groups of type 3 is greater than or equal to zero.
24 ACS Paragon Plus Environment
Page 27 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
Industrial & Engineering Chemistry Research
6. The product of number of odd-valent groups less 3 and number of trivalent groups of
2
type 3 is greater than or equal to zero.
3
The other structural constraint approach is based on adjacency matrix which is a
4
symmetric matrix that gives the information about the connectivity of the groups in the molecule.
5
The constraints are,23
6
1. The sum of the valence of the groups present in the molecule is equal to the sum of the
7
elements present in the adjacency matrix.
8
w i v a li =
i −1
∑
a
ji
+
j =1
N
∑
a ij
(14)
j = i+1
9
2. Sum of the upper triangular matrix elements in the adjacency matrix is equal to the total
10
number of groups present in the molecule -1 + number of rings. (If number of rings = 0
11
an acyclic molecule is obtained) N −1
12
N
∑∑a
ij
= n −1 + Nr
(15)
i =1 j =i +1
13
3. Number of groups in the molecule N
14
n = ∑ wi
(16)
i =1
15
where wi is a binary number which indicates the presence or absence of the ith group, vali is the
16
valency of the ith group, aij is an element of the adjacency matrix in the ith row and jth column. N
17
is the total number of groups in the basis set, n is the number of groups in the molecule and Nr is
18
the number of rings.
19
Apart from the general structural constraints, in general, molecules are unstable if two
20
heteroatoms are bonded to same carbon atom and at least one heteroatom is also bonded to
21
hydrogen atom. If neither of the two heteroatoms is bonded to hydrogen atoms, then the
22
combination could lead to a stable molecule. Some combinations such as, peroxides (HO-OH)
25 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
which are physically viable, should be avoided as those combinations are highly reactive and are
2
not considered as solvents in general.
3
Practicality Constraints
Page 28 of 68
4
For industrial practicality of the solvent, there are two main constraints reported in the
5
literature.8 One constraint is that it should be synthesized easily and the other is the stability of
6
the molecule. It follows that more the functional groups in the molecule, the greater is the
7
difficulty to synthesize that molecule. Therefore, the limit on the number of kinds of functional
8
groups should be one of the constraints. The molecules that cannot be synthesized easily (for
9
example: molecules containing both aromatic and cycloalkyl groups) and are relatively
10
expensive have to be eliminated with a constraint. The stability of the solvent is ensured by
11
eliminating the selection of the unstable groups such as aldehyde, ethenyl, acetenyl and more
12
than three carbonaceous substitutions on a five / six membered ring. Properties of the structurally
13
feasible molecules are then predicted using GC methods.
14
Optimization Approaches
15
The CAMD methodology can be classified into four categories: generate and test,
16
mathematical optimization, a priori methods and combinatorial optimization approaches. A
17
knowledge based generation and test approach is composed of a set of rules for selecting groups,
18
generating feasible molecules and rating them. This method does not promise that the solvents
19
generated are optimal. In addition, this approach is time consuming because of combinatorial
20
explosion with increase in the number of functional groups. The mathematical optimization
21
approaches including mixed integer non-linear programming (MINLP) and mixed integer linear
22
programming (MILP) have difficulty in framing the exact expression of the structure-property
23
relationships. A priori methods such as COSMO-RS (conductor-like screening model for real
24
solvents) are based on quantum chemical calculations and offer alternatives to the more
25
commonly used GC methods. The combinatorial optimization makes use of stochastic approach
26
and can handle combinatorial complexity of molecular design.
27
CAMD methodology is based on different property estimation methods coupled with an
28
optimization technique. Deterministic and stochastic optimization procedures are used to 26 ACS Paragon Plus Environment
Page 29 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
generate a list of globally best molecules (based on the basic groups chosen) which render the
2
expression of a specific property. For nonlinear (MINLP) problems, due to high level of
3
complexity neither method guarantees the generation of a global solution, as there are many local
4
minima along with a global minimum and it is most likely that the solution will get trapped in
5
one of the many local minima. However, in special cases, these methods can guarantee a global
6
solution but such cases occur rarely in molecular design. Also, these techniques may encounter
7
problems in the case of a combinatorial explosion or a discontinuous search space in solving an
8
MINLP. One strategy to address this issue is to linearize the non-linear components and
9
reformulate the original MINLP problem to a MILP problem. Another approach is to reduce the
10
search space by utilizing the problem structure. When these problems are linearized (MILP) the
11
global optimal solution is found through deterministic approach. The stochastic methods, which
12
generate and use random variables for optimization, can handle combinatorial explosion and a
13
discontinuous search space, because these methods are essentially combinatorial in nature but
14
never guarantee convergence.
15
The types of problems that are encountered in optimization with discrete variables
16
include mixed integer programming, binary integer programming, MILP and MINLP. The most
17
general case is the mixed integer programming problem and is represented as follows: 87
18
Minimize : f(x, y)
19
Subject to : hi (x, y) = bi
20
gi (x, y) ≤ cj
21
i=1,2,….,m
(17)
j=1,2….,r
x=[x1 x2 ….. xn]T , y ∈ Y integer,
22
where f(x, y) is the objective function, hi (x, y) and gi (x, y) are the equality and inequality
23
constraints respectively, which can be either structure feasibility constraints or property
24
constraints. x is a vector of continuous variables and y is a vector of integer variables. The
25
functions f, g and h are convex in continuous variables and are once continuously differentiable.
26
However, depending on the structure of the optimization problem, there are ways to transform
27
non-convex problem into a convex problem.
28
A special case of integer programming in which all the variables of y are either 0 or 1 is
29
called as the binary integer programming and is widely used to solve product design problems.
30
Many mixed integer programming problems are linear in objective function and constraints and
27 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 30 of 68
1
are termed as MILP problems. When the objective function and/or the constraints are non-linear
2
it is termed as MINLP problem.
3
Deterministic Approaches
4
Deterministic methods for solution of MILP and MINLP problems include outer approximation
5
(OA), branch and bound (BB), interval analysis and generalized Benders decomposition. Among
6
which BB and OA are most commonly employed and are discussed below.
7
Branch & Bound
8
The branch and bound is very effective method for solving MILP and NLP problems. Consider
9
a general MILP problem as represented by equation 17. The simplest way to solve such integer
10
optimization problem is to enumerate all integer points by discarding the infeasible ones;
11
followed by identification of point that has best objective function value among the feasible
12
integer points. However, this approach will be computationally expensive even for moderate size
13
problems. The BB can be considered as the refined enumeration method in which most of the
14
non promising integers are discarded without even testing them.
15
In BB method, initially, the continuous problem obtained by relaxing integer restrictions
16
on the variables (i.e. y can be real) is solved. If the solution happens to be an integer, it
17
represents the optimal solution. Otherwise (i.e. if yk is real), the problem is split into two
18
subproblems, one with an upper bound constraint (yk ≥ [yk] +1) and the other with lower bound
19
constraint (yk ≤ [yk]). This split will facilitate in reducing some part of the continuous space that
20
is not feasible for the integer problem and at the same time ensures that none of the feasible
21
solutions are eliminated. The process of branching continues till the optimal solution is found.
22
This method is at times computationally expensive, when the number of branches becomes too
23
many.
24
The BB used the same method as described above to solve the MINLP problems with
25
non-linear relations among continuous variables and linear binary or integer variables, and the
26
problem is represented by equation 18:87
27 28
28 ACS Paragon Plus Environment
Page 31 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
Minimize : z = f(x)+cTy
2
Subject to : hi (x) = 0 gi (x) +By ≤ 0
3 4
i=1,2,….,m
(18)
i=1,2…..,m
x∈ X, y ∈ Y integer,
5
where x is a vector of continuous variables, y is a vector of integer (usually binary) variables, B
6
is a matrix and X and Y are sets.
7
Outer Approximation
8
The outer approximation (OA) algorithm has been developed by Duran and Grossmann in
9
1986.88 The basic idea in the OA algorithm is to decompose the MINLP model (equation 17)
10
into NLP primal and MILP master problems. In each iteration, OA involves solving of two
11
subproblems. First, the problem is solved as a NLP(yk) by fixing the integer y variables at some
12
set of values yk and optimize over continuous x variables between their bounds.
13
NLP problem: Equation (18) is modified as:87
14
Minimize : f(x)+cTyk
15
Subject to : hi (x) = 0 gi (x) +Byk ≤ 0
16 17
i=1,2,….,m
(19)
i=1,2….,m
x∈ X, yk ∈ Y integer,
18
Then a linearization is carried out around the optimal solution, and the resulting constraints are
19
added to the linear constraints that are already present. This new linear model is referred to as the
20
master MILP problem.
21
MILP subproblem: The new variable w is introduced to make the objective function linear.
22
Minimize : w +cTy
23
Subject to : w ≥ f(xi) +∇f T(xi)(x-xi),
i=1,2,…..,m
24
hi (xi) + ∇hT(xi)(x-xi)=0
25
gi (xi) + ∇gT(xi)(x-xi) +Byi ≤ 0 i=1,2….,m
26
i=1,2,….,m
x ∈ X, y ∈ Y integer,
27 29 ACS Paragon Plus Environment
(20)
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
For the minimization problem, the NLP primal problem provides an upper bound for the
2
objective function, while the MILP master problem creates a lower bound. The primal and
3
master problems are solved alternatively until convergence. This method cannot guarantee a
4
global optimal solution because of the non-convexity in the NLP subproblem. A common
5
strategy is to try several initial guesses, to see if a consistent solution is obtained.
6
A mathematical programming approach to solve the CAMD problem for design of
7
solvents for extractive fermentation employing the OA algorithm with equality relaxation and
8
augmented penalty has been proposed by Wang and Achenie.89 Initially the problem is
9
reformulated such that all the binary variables appear linearly and the continuous variables can
10
appear linearly and non-linearly. As the first step, non-linear programming problem is solved and
11
if the solution obtained is an integer then the program terminates successfully; otherwise the
12
MILP problem is solved by linearizing the non-linear constraints using slack variables (a
13
variable that is added to an inequality constraint to transform it to an equality constraint) and
14
penalty weights.
Page 32 of 68
15
Sinha et al.90 used the reduced space BB strategy for solvent design problems and
16
introduced the idea of splitting functions that result in smaller number of branching nodes. A
17
reduced dimension BB algorithm, where branching is done only for a set of branching functions
18
instead of all search variables has been proposed.91 This methodology is applied to a case study
19
where an optimal solvent had to be designed for printing industry to serve as a cleaning agent.
20
The problem with 120 non-linear variables is solved with just four splitting variables. Also, it is
21
observed that as the problem size increases the computation time increases linearly. Sinha et al.
22
developed an interval based global optimization tool called LIBRA to solve CAMD for design
23
blends of solvents for blanket wash to arrive at a global optimal solution.92
24
A recent review by Floudas and Gounaris discusses the research progress in deterministic
25
global optimization over a decade.93 A design of a biocompatible solvent for extractive
26
fermentation and extractive distillation process to yield water free ethanol is formulated as an
27
MINLP problem by Cheng and Wang.94 A two-phase computational scheme was introduced with
28
relative volatility of solvent as one of the prime property constraints. The mixed-integer hybrid
29
differential evolution (MIHDE) algorithm was first applied in order to obtain a feasible solution.
30
Later, to confirm whether the optimal design was achieved, feasible solution obtained from
30 ACS Paragon Plus Environment
Page 33 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
MIHDE was used as an initial starting point for the mixed-integer sequential quadratic
2
programming solver. The optimal solvent obtained by this approach was found to be 3-octanone.
3
The identification of three dimensional molecular (crystal) structure that best fits the X-
4
ray diffraction measurements using discrete and non-linear optimization approach is reported by
5
Sahinidis.95 The model developed possesses multiple local minima because of non-convexities in
6
the objective function. As the model does not impose atomicity constraints, the solutions do not
7
guarantee the coincidence of two atoms will not take place. The addition of constraints provided
8
complete problem formulation leading to the correct crystal structures.
9
Stochastic Approaches
10
Stochastic optimization techniques include simulated annealing (SA), genetic algorithm
11
(GA) and Tabu search. These heuristic search methods can be applied to certain type of
12
combinatorial problems when BB and OA are difficult to apply or converge too slowly. In Tabu
13
search, the random moves are performed by preventing the already visited solutions by keeping
14
track of previous moves with the incorporation of a short term memory function. The knowledge
15
on the new starting point (which has not been explored previously) and when to restart the entire
16
procedure is determined with the help of a long term memory function. This diversifies the
17
search and spans the entire search space. The GA framework being a multiple point search
18
technique offers a number of advantages. It examines a set of solutions and not just one solution,
19
and the stochastic nature of the algorithm helps the search to escape local minima traps. In
20
addition, it is easy to solve as it is not a derivative based technique. A detailed review on the
21
usage of these three leading optimization techniques along with their advantages and
22
disadvantages has been reported by Fouskakis and Draper.96 The SA and GA are widely used
23
stochastic optimization techniques in the literature on product design; hence we intend to
24
describe the functioning of these methods in little detail in the following section.
25
Simulated Annealing
26
SA is a combinatorial optimization technique for solving unconstrained and bound-constrained
27
optimization problems based on random estimates of the objective function, f and the evaluation
28
of the constraints (equation 17)87. This method usually requires large number of function
29
evaluations to find the optimal solution. But, it promises the attainment of global optimal
30
solution even for the ill conditioned functions with multiple local minima. Also, the quality of 31 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 34 of 68
1
the final solution is not affected by the initial guess. A new point is randomly generated in each
2
iteration. The distance between the new point and the current point is based on probability
3
distribution. The strategy of attaining global solution is achieved through the introduction of two
4
steps. The first is the so-called "Metropolis algorithm", which helps in spanning the space of
5
solutions with the following probability criterion
6
P(∆f) = e-∆D/kT > R(0,1)
(21)
7
where, ∆D is the change of distance implied by the move, k is the scaling factor called
8
Boltzmann’s constant, T is the “synthetic temperature”, R(0,1) is a random number in the
9
interval [0,1],
∆f = f(Xi+1)-f(Xi) and P is the probability. D is called the cost function/
10
performance objective function and corresponds to free energy in the case of annealing a metal.
11
The second step is, by the analogy of a metal, to lower the temperature. This notion of slow
12
cooling is implemented in the SA algorithm as a slow decrease in the probability of accepting
13
worse solutions as it explores the solution space.
14
The algorithm accepts all new points that lower the objective function value along with
15
few points that raise the objective function with certain probability. The algorithm avoids being
16
trapped in local minima by accepting the points that raise the objective function.
17
Genetic Algorithm
18
The tabu search and SA operate by transforming a single solution at a given step; whereas GA
19
works with a set of solutions called a population.87 The GA begins with a population of ‘n’
20
chromosomes, which are random strings representing decision variables (0 and 1). Each string is
21
associated with fitness value derived from the objective function and is used in successive
22
genetic operations. At each step, the GA randomly selects the individuals from the current
23
population are then these are made to go through the process of evolution using three operators
24
namely selection, crossover and mutation to create a new population. By using mutation and
25
crossover functions GA handles the linear and bound constraints and generates only feasible
26
newer points. The new population is then tested for termination; and until the termination
27
criterion is met, the population is iteratively operated by above three operators. One cycle of
28
these three operations and subsequent evaluation procedure is known as a generation.
29
the most widely used stochastic optimization technique in the product design problems.
GA is
30
All the above discussed methods and their accuracies are limited by the availability of
31
parameters of GC methods. For single objective problems, the solution is unique as we are trying 32 ACS Paragon Plus Environment
Page 35 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
to minimize or maximize only one particular property in the solvent design problem. However,
2
in a multi-objective optimization framework in general, the solution is not a single output but a
3
pareto set which is obtained from the potential compromise among the objectives. Hence, the
4
solvent design problem with multi-objective function is a non-convex set (a set of points in
5
which not all segments connecting points of the set lie entirely in the set); and we have many
6
local minima apart from the global minima.
7
There is ample literature on the application of stochastic optimization techniques for
8
CAMD problem over past decade. Venkatasubramanian et al.97 was one among the foremost to
9
employ a stochastic optimization technique, GA, for polymer design problem where properties
10
are estimated using GC method. Van Dyk and Nieuwoudt98 developed a GA based CAMD
11
called ‘SolvGen’ to design the solvents and solvent mixtures for extractive distillation,
12
azeotropic distillation, liquid extraction and liquid chromatography. The study revealed that
13
solvent blends performed better than pure solvents for extractive distillation systems and a few
14
new solvents other than classical solvents were listed for azeotropic distillation systems. Also, a
15
number of these predictions were verified through experiments and the results found to hold
16
good. Lehmann and Maranas99 examined the combination of quantum chemical methods with
17
multi-objective optimization for design of solvents for liquid-liquid extraction, such as benzene-
18
cyclohexane system. GA has been chosen as the optimization technique and is tuned based on
19
GC methods. Tuned GA calls the quantum chemical subroutine to evaluate the properties of
20
generated molecules. A CAMD using tabusearch algorithm based on CI property estimation
21
method implemented with novel neighbour-generating operators, such as swap and move, has
22
been developed by Lin et al.100 to design transition metal catalysts. The Tabu lists helped in
23
diversifying the search space to cover the entire search space and locate the final solution
24
precisely. Also, the algorithm is able to locate large number of near optimal solutions within a
25
short span.
26
Wu et al.101 have proposed an improved genetic algorithm technique for CAMD, by
27
including the cross-generatant elitist selection, dislocation crossover, and mutation operators.
28
The results obtained by using this method for a known system agreed well with that reported in
29
literature. A review has been presented by Song and Song102 to design environmentally friendly
30
solvents for separation processes using CAMD approach based on SA technique and modified
31
UNIFAC GC method. The proposed methodology has shortened the computing time greatly and
32
few case studies have been illustrated for industrial problems with a single objective function. 33 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 36 of 68
1
Serrato et al.103 proposed a new design strategy composed of sequential GA combined with
2
quantum chemical calculations for CAMD. This strategy is applied to design of solvents for
3
extraction of lactic and acetic acids, and it was found that aldehyde and acid groups are the most
4
important groups for the design.
5
For more than a decade, Diwekar’s group has done extensive work on the deterministic
6
and stochastic approaches for CAMD. Kim and Diwekar104 used Hammersely stochastic
7
annealing algorithm for CAMD to design solvents for extraction of acetic acid from water. They
8
used UNIFAC model to estimate infinite dilution activity coefficient and the Hansen’s solubility
9
parameter model for solubility prediction. Xu and Diwekar105 reported that the Hammersley
10
stochastic GA performed better than the stochastic SA technique for the above mentioned case
11
study. A multi-objective efficient GA called MOEGA has been developed for solvent selection
12
and solvent recycling by the same group.106 The algorithm uses the weighting method and the
13
weights are generated by Hammersley Sequence Sampling technique. A higher number of pareto
14
sets were obtained when compared to SA technique. A more recent review article has been
15
published on the developments in product and process design for environmental considerations.
16
Unlike traditional design, such process design problems involve multiple objectives increasing
17
the complexity of the problem. The new approaches and algorithms that are continuously being
18
sought over past few years have been reviewed.107
19
A hybrid optimization approach which uses both deterministic and stochastic techniques (OA
20
and SA) to solve a solvent design problem has also been reported;108 however, it was also
21
mentioned that the approach has not been proven to give a global optimal solution. A systematic
22
deterministic and stochastic approach for solvent design that maximizes product formation by
23
enhancing the reaction rate constants of main reaction and suppresses by-product formation has
24
been proposed by Folic et al.109 This led to a MINLP problem formulation, which is linear in the
25
binary variables and is solved using OA algorithm. The solvent design methodology is modified
26
so as to fit the problem definition by building reaction model from the rate constant data of the
27
chosen molecules, before optimizing the objective function. On the other hand, modifying the
28
objective function (as the logarithm of the reaction rate constant) led to MILP problem
29
formulation and a case study showed that the solution of this design problem can lead to solvents
30
that potentially suppress undesired reactions. There was an overlap of the solvent candidates in
34 ACS Paragon Plus Environment
Page 37 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
both deterministic and stochastic optimization approaches, indicating the relative insensitivity of
2
the approach employed to the design of solvents.
3
Solvent Design for Pharmaceuticals
4
Based on the discussion in earlier sections, it is evident that the need for the solvent
5
design for pharmaceuticals is immense and there exist various structure property prediction
6
methods and optimization techniques for designing a solvent with bounds on specific properties.
7
This section summarizes the literature on solvent design for pharmaceuticals. Liquid-liquid
8
extraction and crystallization are the most common and vital operations that occur in all
9
pharmaceutical manufacturing processes; selection of the solvent for these operations is
10
important as solvents influence the nature or quality of the finished product. Numerous articles
11
have been published related to crystallization when compared to extraction, as in most
12
pharmaceuticals, the crystallization is the final process operation where, all accumulated
13
impurities from process flow are present, as a result leading to more complexities. Table 4 gives
14
a summary of the various case studies reported in literature on solvent design, with the properties
15
of interest, property estimation and the optimization approach used for extraction and
16
crystallization operations.
17
Solvent extraction
18
A review on strategies for solvent selection and high pressure liquid chromatography
19
mobile phase optimization has been reported by Barwick110 to select alternative solvents, which
20
meet the performance requirements of toxicity, flammability and cost for solvent extraction and
21
liquid chromatography. The solvents have been classified according to their polarity and
22
selectivity by using Hildebrand and the Rohrschneider polarity schemes as the basis for solvent
23
selection. Cismondi and Brignole
24
branched molecules, which used GC based on the electronegativities of each group, to predict
25
mixture and pure component properties. The algorithm used for the study was the knowledge
26
based generation and test approach, which generates chemicals by means of some expert rules
27
but cannot guarantee that the solvents are optimal. Gani9 proposed a method for the selection of
28
green solvents for the promotion of organic reactions occurring in liquid phase based on reaction
111
have proposed an efficient search algorithm for design of
35 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 38 of 68
1
solvent properties such as reactivity of solvent, phase split, solubility, selectivity, toxicity, etc.,
2
with four illustrative examples. This methodology employs, estimation of thermodynamic
3
properties to generate a knowledge base of reactions using reaction indices and MG method with
4
ProCAMD database. The constraints are based on solvent and environment related properties
5
that directly or indirectly influence the rate and/or conversion of a given reaction. One of the
6
case studies was on the replacement for dichoromethane as solvent in oxidation reactions, where
7
the solvent needs to dissolve 3-octanol and 3-octanone with density close to water and should be
8
liquid at room temperature. The ProCAMD has generated 2-pentanone as the optimal solvent.
9
Crystallization
10
Crystallization, a key and complex operation in pharmaceutical processes, is a good
11
example to illustrate how process knowledge can be used in the selection of solvents. The crystal
12
morphology affects dissolution characteristics, bio-availability, solubility and the ease with
13
which the crystals can be compressed into tablets. As the pharmaceutical molecules are organic
14
in nature, studies are restricted to crystallization in organic solvents as the solubility of the
15
organic solute in aqueous solvents is poor. It is well known that different classes of molecules
16
with a single solvent give rise to different crystal morphologies. The crystal morphology of many
17
organic solutes is strongly influenced by the solvent used for crystallization and its polarity.
18
Highly polar solvents tend to produce crystals with low aspect ratio and vice versa with non-
19
polar solvents. 19 Further, Gernaey and Gani112 presented a systematic model based approach for
20
pharmaceutical product design and analysis consisting of a modeling tool; a knowledge base;
21
computer aided methods and tools, and a user interface. The application of the framework has
22
been demonstrated with examples of crystallization and fermentation processes.
23
A review on the methods used to estimate the solubility of organic solids in a wide
24
variety of solvents for the crystallization process has been presented by Frank et al.113 The
25
screening of solvents has been performed using Hildebrand and Hansen solubility parameters. In
26
a particular example, the Hansen solubility parameters for aspirin have been regressed using four
27
different solvents (acetone, chloroform, ethanol and cyclohexane). Once the parameters have
28
been identified, a quick estimate of the solubility of aspirin in any solvent can be obtained,
29
provided the solubility parameters are also available for the solvents. It was found that neither
36 ACS Paragon Plus Environment
Page 39 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
Hansen model nor the UNIFAC model provided adequate quantitative results because of the
2
multifunctional nature of pharmaceutical molecules. Kolar et al.57 addressed the problem on
3
solvent selection for pharmaceuticals by investigating the solubility of pharmaceutical
4
compounds in a wide range of solvents of varying polarity and hydrogen bonding tendency. The
5
estimation of solubilities by regular solution theory approach, GC methods, QSPR methods, and
6
molecular simulation methods have been summarized. Their study was focused on small to
7
medium size aromatics and heterocyclic compounds. Abildskov and O’Connell
8
approach which uses GC method for predicting the solubility of sparingly soluble pharmaceutical
9
compounds for solvent design problems. To minimize the numbers of adjustable parameters and
10
reduce the uncertainty, an optimal reference solvent procedure has been reported; where the
11
difference in solubility at infinite dilution between the solvent of interest and the optimal
12
reference solvent is computed. This methodology has been applied to predict the solubility of
13
various compounds such as, ephedrine, hydrocortisone, salicylic acid, niflumic acid, diuron and
14
monuron. The optimal solvents found from this approach were found to be in good agreement
15
with the experimental measurements. In addition, this method was found to effectively eliminate
16
errors in pure-solute properties and many binary interaction parameters.
114
suggested an
17
Further, the difference in solubility ranges for polyphenols using Hansen solubility
18
parameters, with the variation in the polarity of the solvent (ethanol-water mixture) has been
19
estimated using GC method by Savova et al.
20
experimental solubility profile, were found to be in good agreement for ideal mixtures; though,
21
found to have some shortcomings because of kinetic effects in case of non-ideal mixtures, multi-
22
component systems and diffusion dominated processes.
115
The calculated δ values when compared to the
23
Hydrogen bonding plays a critical role in solvent selection for crystallization and δH of
24
solvents can be correlated to crystal morphology.72 Many publications have focused on ibuprofen
25
as the model pharmaceutical compound. A CAMD framework has been proposed by Karunanithi
26
et al.
27
The methodology has been illustrated to design an optimal solvent for cooling crystallization
28
process. The CAMD problem has been formulated as a MINLP model with two different
29
objectives: one with potential recovery as the performance objective, which needs to be
30
maximized and the other minimizing toxicity. The structural constraints based on octet rule have
31
been employed. Further, properties such as solubility, flashpoint, toxicity, viscosity, normal
19
to design solvents with δH as a key property for desired ibuprofen crystal morphology.
37 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 40 of 68
1
boiling and melting point have been posed as property constraints. The solubility has been
2
estimated using the regular solution theory approach (and equation 9) as discussed in a previous
3
section of this article. All the properties have been estimated using GC based methods. The
4
decomposition based approach has been used to decompose the MINLP model into sub-problems
5
for the generation of optimal solvent molecules. It was observed that the ibuprofen crystals
6
crystallized from solvents with high hydrogen bonding ability were plate-like with low aspect
7
ratio and of large size. On the other hand, ibuprofen crystallized from solvents with low
8
hydrogen bonding ability were needle-like crystals with high aspect ratio. The optimal solvent
9
for cooling crystallization process was found to be methoxymethyl 2-ethoxy acetate for
10
maximizing potential recovery and methoxy (2-methoxyethoxy) methane for minimizing
11
toxicity. Also, the performances of the solvents were verified qualitatively through SLE
12
diagrams and, more recently through experiments. 116
13
The relation of crystal size and the hydrogen bonding ability of the solute with the
14
solvent has been debated in literature.20, 21, 117 The belief has been that hydrogen bonding is a
15
more important parameter than the polarity of the solvent. In addition, the nucleation and growth
16
towards the formation of a specific polymorph is affected by hydrogen bonding. A method based
17
on the atomic electronegativity that calculates the partial charge distribution in the solute and
18
solvent molecules to develop correlations and predict the hydrogen bonding ability of the solute
19
and/or solvent molecule has been proposed by Mirmehrabi and Rohani.117 A case study has been
20
reported using this approach to screen solvents for crystallization of the drug ‘Ranitidine
21
Hydrochloride’ using database search. It was found that solvents with higher dipole moments
22
influence the hydrogen bonding between solute molecules.
23
Further, the study on the effect of solvent on the shape of the crystal has been conducted
24
with experimental validation of the proposed CAMD framework using decomposition based
25
approach.116 It has been found and verified that the ibuprofen crystals formed from 2-ethoxy
26
ethyl acetate as solvent are significantly larger and have low aspect ratio when compared to
27
crystals formed from solvent, n-hexane, by the combined approach of experiments, database
28
search and CAMD. In addition, it has also been proved from combined approach of experiments
29
and CAMD, for the design of solvents for crystallization of carboxylic acids, that different solute
30
molecules of the same class (carboxylic acids) show, different crystal morphology with a single
31
solvent component.20 It has been observed from experimental studies, that the aspect ratio of the 38 ACS Paragon Plus Environment
Page 41 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
crystals does not always have an inverse relationship with δH. Moreover, solvents exhibiting high
2
intermolecular hydrogen bonding tend to be viscous in nature, thus it leads to the conclusion that,
3
δH alone should not be taken as a criterion to shortlist solvent molecules. The database search
4
approach resulted in monohydric alcohols as the optimal solvents for the crystallization of
5
sebacic acid with potential recovery as objective function and melting point, boiling point,
6
solubility, δH, viscosity, flash point, toxicity as property constraints. Acquah et al.118 have
7
identified “acceptance number” as an index that substantially collars the ibuprofen-solvent
8
hydrogen bonding interactions, apart from dipole moment and solubility, by fitting linear
9
regression models to the experimental data and the solvent properties. The predictions based on
10
the model, have been found to be in good agreement with experimentally determined aspect ratio
11
data from literature. A flowchart has been proposed for various solvent categories and
12
subcategories based on acceptance number and intermolecular interaction respectively.
13
The study on the use of mixed solvents for crystallization was of interest for quite a few 19,77, 119,120
14
authors in the literature.
Its application in improvement of physical and chemical
15
properties of the crystal and solvent; and ability to dissolve certain substances made the uses
16
manifold. Winn and Doherty presented a review on modeling of crystal shapes of organic
17
materials grown from solution.119 It has been observed that, crystals grown from a mixture of
18
solvents have different characteristics than the crystal grown from pure solvent and this effect is
19
significant if solute has very different solubility in each solvent. On the other hand, the study by
20
Zilnik et al.120 on the solubility of diclofenac in a mixture of dichloromethane and dimethyl
21
sulfoxide resulted in decolourization of solution suggesting instability of solute in mixed solvent.
22
In continuation to the earlier discussion on design of solvent for desired ibuprofen crystal
23
morphology by Karunanithi et al.;19 a case study on design of solvent anti-solvent mixture has
24
been carried out by drowning out technique. The results indicate that 3-(ethoxymethoxy)
25
propanal and butane-1,2,4 triol were the optimal solvent and anti-solvent pair, when potential
26
recovery has been maximized.
27
Recently, a case study on solvent design using CAMD, for improving the crystal
28
morphology of 2, 6-dihydrobenzoic (DHB) acid has been reported. 21 A database search has been
29
performed to select solvents with different properties such as polarity, hydrogen bonding ability,
30
aromaticity, etc., as a preliminary step. Further, MD simulations were carried out to understand
31
the interactions on solvent-crystal interfaces and modified attachment energy model was used to 39 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
estimate the aspect ratio of crystals in various solvents. From this approach, it has been found
2
that a mole ratio of 1:4 of diethylether / toluene mixture reduced the aspect ratio of the DHB
3
crystals and was confirmed experimentally. Therefore, it can be concluded that the choice of
4
solvent plays a significant role in the final crystal morphology and the CAMD approach has the
5
potential to aid in the rational selection of solvents for improving the morphology of the crystal.
6
However, it should be noted that this is not the only approach to design crystallization processes
7
for desired crystal morphology. Other factors which tailor the crystal morphology are pH,
8
temperature, supersaturation, mixing intensity, seed crystals121,
9
account while designing the process.
122
Page 42 of 68
and have to be taken into
10
Summary
11
This article presents an extensive literature survey on chemical product design, property
12
prediction methods and optimization approaches employed for product design. Amongst the
13
methods for prediction of properties from molecular structure, the GC is the oldest and most
14
widely used. However, due to limited availability of reliable group contributions, its usage is
15
limited. The GIC method gives better estimates than GC but requires more model parameters for
16
property estimation. The method of CI, which makes use of “molecular descriptors”, has better
17
prediction accuracy when compared to GC and GIC; but the knowledge of regression
18
coefficients is essential for property prediction. MG method with inclusion of higher order group
19
contributions have been widely used in recent literature for property prediction because of the
20
low AAE. The combinatorial methods GC-CI, UNIFAC-CI, MG-CI methods, where the missing
21
group’s contribution can be predicted by CI, can sometimes lead to high AAE in property
22
estimation. A brief insight on using molecular simulation with quantum mechanical calculations
23
for the prediction of properties is also discussed. Although, the resultant molecules obtained by
24
this method are more reliable and can be directly put in use, the method is computationally
25
expensive. The scope on the estimation of properties from chemical structure using more
26
accurate thermodynamic models and ab initio calculation methods, well correlated topological
27
indices, etc., is extensive and is a great challenge for future research.
28
The properties important for solvent design (as reported in literature) include, solubility
29
and hydrogen bonding interaction parameter for crystallization process; and activity co-efficient
30
involved in phase equilibrium for extraction. Ionic liquids tend to find wide applications in
40 ACS Paragon Plus Environment
Page 43 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
pharmaceuticals and, hence, can also be considered in the design of new solvent molecules. The
2
properties of interest in choosing the optimal ionic liquid may differ from the properties of
3
interest of organic solvents and there is a need for further investigation.
4
Further, literature on numerous optimization techniques that are used to solve the solvent
5
design problem has been discussed. The problem formulation, incorporating definitions of
6
different constraints such as property constraints, structural feasibility constraints, etc., is
7
presented in detail. Both stochastic approach and deterministic approaches were widely used in
8
the literature. Many developments were made and are still underway, to broaden the search
9
direction, so as to guarantee a global optimal solution.
10
Rational solvent design can be further extended to the process design problem by
11
choosing the operating conditions such as the temperature, pressure and on what follows in the
12
process. This sequential decision making can lead to better performance as it includes the design
13
space and it incorporates the intrinsic links between molecules and process.
14
Acknowledgement
15
The author M. Harini is grateful to Council of Scientific and Industrial Research (CSIR), New
16
Delhi for financial support.
17 18 19 20 21 22 23 24
41 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
References
2
1.
3
Engg. Res. Des. 2000, 78, 5.
4
2.
Hill, M., Product and process design for structured products. AICHE J. 2004, 50, 1656.
5
3.
Harper, P. M.; Gani, R., A multi-step and multi-level approach for computer aided
6
molecular design. Comp. Chem. Engg. 2000, 24, 677.
7
4.
8
future of chemical engineering ? Chem. Engg. Sci. 2002, 57, 4667.
9
5.
Page 44 of 68
Moggridge, G. D.; Cussler, E. L., An introduction to chemical product design. Chem.
Charpentier, J. C., The triplet "molecular processes-product-process" engineering: the
Karunanithi, A. T.; Achenie, L. E. K.; Gani, R., A new decomposition-based computer-
10
aided molecular/mixture design methodology for the design of optimal solvents and solvent
11
mixtures. Ind. Engg. Chem. Res. 2005, 44, 4785.
12
6.
13
Reflecting industry trends and challenges. Comp. Chem. Engg. 2010, 34, 857.
14
7.
15
Simultaneous solution of process and molecular design problems using an algebraic approach.
16
Comp. Chem. Engg. 2010, 34, 1481.
17
8.
18
Methodology. AICHE J. 2011, 57, 2431.
19
9.
20
promotion of organic reactions. Comp. Chem. Engg. 2005, 29, 1661.
21
10.
22
global optimization. AICHE J. 2003, 49, 1761.
23
11.
24
indices. Ind. Engg. Chem. Res. 1999, 38, 1884.
25
12.
26
ionic liquids via computational molecular design. Comp. Chem. Engg. 2010, 34, 1476.
27
13.
28
Res. Des. 2004, 82, 1494.
29
14.
30
applied thermodynamics. Chem. Engg. Res. Des. 2004, 82, 1505.
Smith, B. V.; Ierapepritou, M. G., Integrative chemical product design strategies:
Bommareddy, S.; Chemmangattuvalappil, N. G.; Solvason, C. C.; Eden, M. R.,
Conte, E.; Gani, R.; Ka Ming, N., Design of Formulated Products: A Systematic
Gani, R.; Jimenez-Gonzalez, C.; Constable, D. J. C., Method for selection of solvents for
Sahinidis, N. V.; Tawarmalani, M.; Yu, M. R., Design of alternative refrigerants via
Camarda, K. V.; Maranas, C. D., Optimization in polymer design using connectivity
McLeese, S. E.; Eslick, J. C.; Hoffmann, N. J.; Scurto, A. M.; Camarda, K. V., Design of
Gani, R., Computer-aided methods and tools for chemical product design. Chem. Engg.
Abildskov, J.; Kontogeorgis, G. M., Chemical product design - A new challenge of
42 ACS Paragon Plus Environment
Page 45 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
15.
O'Connell, J. P.; Gani, R.; Mathias, P. M.; Maurer, G.; Olson, J. D.; Crafts, P. A.,
2
Thermodynamic Property Modeling for Chemical Process and Product Engineering: Some
3
Perspectives. Ind. Engg. Chem. Res. 2009, 48, 4619.
4
16.
5
A.; Fisher, G.; Sherwood, J.; Binks, S. P.; Curzons, A. D., Expanding GSK's solvent selection
6
guide - embedding sustainability into solvent selection starting at medicinal chemistry. Green
7
Chem. 2011, 13, 854.
8
17.
9
production stages in the pharmaceutical industry. Future Med. Chem. 2011, 3, 1469.
Henderson, R. K.; Jimenez-Gonzalez, C.; Constable, D. J. C.; Alston, S. R.; Inglis, G. G.
Nicponski, D. R.; Ramachandran, P. V., The role of solvent selection at exploratory and
10
18.
Grodowska, K.; Parczewski, A., Organic solvents in the pharmaceutical industry. Acta
11
Poloniae Pharmaceutica 2010, 67, 3.
12
19.
13
framework for crystallization solvent design. Chem. Engg. Sci. 2006, 61, 1247.
14
20.
15
design for crystallization of carboxylic acids. Comp. Chem. Engg. 2009, 33, 1014.
16
21.
17
of Needle-like Crystals: A Case Study of 2,6-Dihydroxybenzoic Acid. Crys. Growth & Des.
18
2010, 10, 4379.
19
22.
20
materials: An industrial perspective. J. Pharma. Sci. 2008, 97, 2855.
21
23.
22
combinatorial optimization. Comp. Chem. Engg. 2004, 28, 425.
23
24.
24
Prediction of azeotropic parameters. Chem. Engg. Comm. 1998, 169, 1.
25
25.
26
force-based approach. Chem. Engg. Proc. 2003, 43, 251.
27
26.
28
simultaneous separation process and product design. Chem. Engg. Proc. 2004, 43, 595.
29
27.
30
reactive separation systems. Chem. Engg. Proc. 2009, 48, 1047.
31
28.
Adjiman, C. a. G., A., Molecular systems engineering. Wiley-VCH: 2010; Vol. 6.
32
29.
Deal, C. H.; Derr, E. L., Group Contribution in mixtures. Ind. Engg. Chem. 1968, 60, 28.
Karunanithi, A. T.; Achenie, L. E. K.; Gani, R., A computer-aided molecular design
Karunanithi, A. T.; Acquah, C.; Achenie, L. E. K.; Sithambaram, S.; Suib, S. L., Solvent
Chen, J.; Trout, B. L., Computer-Aided Solvent Selection for Improving the Morphology
Chow, K.; Tong, H. H. Y.; Lum, S.; Chow, A. H. L., Engineering of pharmaceutical
Siddhaye, S.; Camarda, K.; Southard, M.; Topp, E., Pharmaceutical product design using
Eladio, P. F., Using the Group-Interaction Contribution Approach (GIC) in mixtures - 1.
Bek-Pedersen, E.; Gani, R., Design and synthesis of distillation systems using a driving-
Eden, M. R.; Jorgensen, S. B.; Gani, R.; El-Halwagi, M. M., A novel framework for
Papadopoulos, A. I.; Linke, P., Integrated solvent and process selection for separation and
43 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 46 of 68
1
30.
Poling B.E.; Prausnitz J.M.; O'Connell J.P., The properties of gases and liquids. 5th ed.;
2
McGraw-Hill: New York, 2001.
3
31.
4
contributions. Chem. Engg. Comm. 1987, 57, 233.
5
32.
6
Computerized design of multicomponent distillation-columns using unifac group contribution
7
method for calculation of activity-coefficients. Ind. Engg. Chem. Proc. Des. Dev. 1977, 16, 450.
8
33.
9
product design: Problem formulations, methodology and applications. Comp. Chem. Engg. 1996,
Joback, K. G.; Reid, R. C., Estimation of pure component properties from group
Fredenslund, A.; Gmehling, J.; Michelsen, M. L.; Rasmussen, P.; Prausnitz, J. M.,
Constantinou, L.; Bagherpour, K.; Gani, R.; Klein, J. A.; Wu, D. T., Computer aided
10
20, 685.
11
34.
12
for the estimation of physico-chemical properties of branched isomers. Chem. Engg. Comm.
13
1998, 163, 245.
14
35.
15
Fluid Phase Equilib. 2001, 183, 183.
16
36.
17
Melting Point of Organic Compounds. Chin. J. Chem. Engg. 2009, 17, 468.
18
37.
19
Academic press: New York, 1976.
20
38.
21
numbers and graph valence-shells in trees. Chem. Phys. Lett. 2002, 354, 417.
22
39.
23
of molecular connectivity. J. Mol. Graphics & Modeling 2001, 20, 4.
24
40.
25
accessibility model. Croatica Chemica Acta 2002, 75, 371.
26
41.
27
Atomic Attribute of Molecular Topological Structure. J. Data Sci. 2003, 1, 361.
28
42.
29
York 2002.
30
43.
31
structures? J. Comp. Aided Mol. Des. 2005, 19, 651.
Eladio, P. F.; Ramon, G. R., A group-interaction contribution approach. A new strategy
Marrero, J.; Gani, R., Group-contribution based estimation of pure component properties.
Wang, Q.; Ma, P.; Neng, S., Position Group Contribution Method for Estimation of
Kier, L. B.; Hall, L. H., Molecular connectivity in chemistry and drug research.
Lukovits, I.; Nikolic, S.; Trinajstic, N., On relationships between vertex-degrees, path-
Hall, L. H.; Kier, L. B., Issues in representation of molecular structure - The development
Kier, L. B.; Hall, L. H., The meaning of molecular connectivity: A bimolecular
Hu, Q.-N.; Liang, Y.-Z.; Fang, K.-T., The Matrix Expression, Topological Index and
Bicerano, J., Prediction of polymer properties. 3rd Ed. ed.; Marcel Dekker: New
Balaban, A. T., Can topological indices transmit information on properties but not on
44 ACS Paragon Plus Environment
Page 47 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
44.
Katritzky, A. R.; Kuanar, M.; Slavov, S.; Hall, C. D.; Karelson, M.; Kahn, I.; Dobchev,
2
D. A., Quantitative Correlation of Physical and Chemical Properties with Chemical Structure:
3
Utility for Prediction. Chem. Rev. 2010, 110, 5714.
4
45.
5
V.; Radchenko, E.; Zefirov, N. S.; Makarenko, A. S.; Tanchuk, V. Y.; Prokopenko, V. V.,
6
Virtual computational chemistry laboratory - design and description. J. Comp. Aided Mol. Des.
7
2005, 19, 453.
8
46.
9
Index: A New Approach for Estimation of Density of Ionic Liquids. Ind. Engg. Chem. Res. 2011,
Tetko, I. V.; Gasteiger, J.; Todeschini, R.; Mauri, A.; Livingstone, D.; Ertl, P.; Palyulin,
Xiong, Y.; Ding, J.; Yu, D. H.; Peng, C. J.; Liu, H. L.; Hu, Y., Volumetric Connectivity
10
50, 14155.
11
47.
12
A.; Echeveria-Diaz, Y.; Zaldivar, V. R.; Tygat, J.; Borges, J. E. R.; Garcia-Domenech, R.;
13
Torrens, F.; Perez-Gimenez, F., Bond-based linear indices of the non-stochastic and stochastic
14
edge-adjacency matrix. 1. Theory and modeling of ChemPhys properties of organic molecules.
15
Mol. Diversity 2010, 14, 731.
16
48.
17
studies of aldehydes and ketones. J. Math. Chem. 2006, 40, 379.
18
49.
19
vaporization of isomers among paraffin hydrocarbons. J. Am. Chem. Soc. 1947, 69, 2636.
20
50.
21
incidence matrix. J. Comput. Chem. 2003, 24, 1812.
22
51.
23
Chem. 2004, 25, 881.
24
52.
25
contribution+ (GC+) based estimation of properties of pure components: Improved property
26
estimation and uncertainty analysis. Fluid Phase Equilib. 2012, 321, 25.
27
53.
28
molecular descriptor: Application to solvent selection. Comp. Chem. Engg. 2010, 34, 1018.
29
54.
30
problem formulation approach to molecular design using property operators based on signature
31
descriptors. Comp. Chem. Engg. 2010, 34, 2062.
Marrero-Ponce, Y.; Martinez-Albelo, E. R.; Casanola-Martin, G. M.; Castillo-Garit, J.
Lu, C. H.; Guo, W. M.; Hu, X. F.; Wang, Y.; Yin, C. S., A novel Lu index to QSPR
Wiener, H., Correlations of heats of isomerization, and differences in heats of
Yang, F.; Wang, Z. D.; Huang, Y. P.; Zhu, H. L., Novel topological index F based on
Yang, F.; Wang, Z. D.; Huang, Y. P., Modification of the Wiener index 4. J. Comput.
Hukkerikar, A. S.; Sarup, B.; Kate, A. T.; Abildskov, J.; Sin, G.; Gani, R., Group-
Weis, D. C.; Visco, D. P., Computer-aided molecular design using the Signature
Chemmangattuvalappil, N. G.; Solvason, C. C.; Bommareddy, S.; Eden, M. R., Reverse
45 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
55.
2
1987.
3
56.
4
2002.
5
57.
6
Fluid Phase Equilib. 2002, 194, 771.
7
58.
8
Equilibrium in Pharmaceutical-Solvent Mixtures: Systems with Complex Hydrogen Bonding
9
Behvaior. AICHE J. 2009, 55, 756.
Page 48 of 68
Allen, M. P.; Tildesley, D. J., Computer simulations of liquids. Clarendon Press, Oxford:
Frenkel, D.; Smit, B., Understanding molecular simulation. Academic Press: San Diego,
Kolar, P.; Shen, J. W.; Tsuboi, A.; Ishikawa, T., Solvent selection for pharmaceuticals.
Tsivintzelis, I.; Economou, I. G.; Kontogeorgis, G. M., Modeling the Solid-Liquid
10
59.
Izgorodina, E. I., Towards large-scale, fully ab initio calculations of ionic liquids. Phys.
11
Chem. Chem. Phys. 2011, 13, 4189.
12
60.
13
Chem. Engg. Res. Des. 1992, 70, 78.
14
61.
15
combined molecular modeling and group contribution. Fluid Phase Equilib. 1999, 158, 337.
16
62.
17
reaction kinetics. Chem. Engg. Sci. 2006, 61, 6199.
18
63.
19
aided polymer design using multi-scale modelling. Brazilian J. Chem. Engg. 2010, 27, 369.
20
64.
21
of solvent property parameters: implication to polymorph screening. Int. J. Pharmaceutics 2004,
22
283, 117.
23
65.
24
models for property prediction of organic chemical systems. Fluid Phase Equilib. 2011, 302,
25
274.
26
66.
27
connectivity index for pure-component property prediction. Ind. Engg. Chem. Res. 2005, 44,
28
7262.
29
67.
30
Atom Connectivity Index-Based Methods for Estimation of Surface Tension and Viscosity. Ind.
31
Engg. Chem. Res. 2008, 47, 7940.
Meniai, A. H.; Newsham, D. M. T., The selection of solvents for liquid-liquid extraction.
Harper, P. M.; Gani, R.; Kolar, P.; Ishikawa, T., Computer-aided molecular design with
Stanescu, I.; Achenie, L. E. K., A theoretical study of solvent effects on Kolbe-Schmitt
Satyanarayana, K. C.; Abildskov, J.; Gani, R.; Tsolou, G.; Mavrantzas, V. G., Computer
Gu, C. H.; Li, H.; Gandhi, R. B.; Raghavan, K., Grouping solvents by statistical analysis
Mustaffa, A. A.; Kontogeorgis, G. M.; Gani, R., Analysis and application of GC(Plus)
Gani, R.; Harper, P. M.; Hostrup, M., Automatic creation of missing groups through
Conte, E.; Martinho, A.; Matos, H. A.; Gani, R., Combined Group-Contribution and
46 ACS Paragon Plus Environment
Page 49 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
68.
Gmehling, J.; Li, J. D.; Schiller, M., A modified UNIFAC model. 2. Present parameter
2
matrix and results for different thermodynamic properties. Ind. Engg. Chem. Res. 1993, 32, 178.
3
69.
4
of active pharmaceutical ingredients. Int. J. Pharmaceutics 2010, 388, 73.
5
70.
6
prediction of UNIFAC group interaction parameters. AICHE J. 2007, 53, 1620.
7
71.
8
design for solvents for separation processes. AICHE J. 1994, 40, 1349.
9
72.
Hahnenkamp, I.; Graubner, G.; Gmehling, J., Measurement and prediction of solubilities
Gonzalez, H. E.; Abildskov, J.; Gani, R.; Rousseaux, P.; Le Bert, B., A method for
Pretel, E. J.; Lopez, P. A.; Bottini, S. B.; Brignole, E. A., Computer-aided molecular
Karunanithi, A. T.; Acquah, C.; Achenie, L. E. K., Tuning the morphology of
10
pharmaceutical compounds via model based solvent selection. Chin. J. Chem. Engg. 2008, 16,
11
465.
12
73.
13
coefficient and aqueous solubility. Ind. Engg. Chem. Res. 2002, 41, 6623.
14
74.
15
models to the calculation of the octanol-water partition coefficient. Ind. Engg. Chem. Res. 2001,
16
40, 434.
17
75.
18
octanol-water partition coefficient. J. Chem. Inf. Modeling 2006, 46, 1598.
19
76.
20
optimized molecular connectivity index. J. Chem. Inf. Modeling 2005, 45, 930.
21
77.
22
preformulation and drug delivery.Prediction of Pharmaceutical Solubility Via NRTL-SAC and
23
COSMO-SAC. J. Pharm. Sci. 2008, 97, 1813.
24
78.
25
solid solubility for solvent selection - A review. Ind. Engg. Chem. Res. 2008, 47, 5234.
26
79.
27
and Solvent Mixtures for Drug Process Design. J. Pharm. Sci. 2009, 98, 4205.
28
80.
29
compounds to the fathead minnow (Pimephales promelas) using a group contribution method.
30
Chem. Res. Toxicol. 2001, 14, 1378.
31
81.
32
Environment-Related Properties of Chemicals for Design of Sustainable Processes: Development
Marrero, J.; Gani, R., Group-contribution-based estimation of octanol/water partition
Derawi, S. O.; Kontogeorgis, G. M.; Stenby, E. H., Application of group contribution
Sedykh, A. Y.; Klopman, G., A structural analogue approach to the prediction of the
Soskic, M.; Plavsic, D., Modeling the octanol-water partition coefficients by an
Tung, H.-H.; Tabora, J.; Variankaval, N.; Bakken, D.; Chen, C.-C., Pharmaceutics,
Modarresi, H.; Conte, E.; Abildskov, J.; Gani, R.; Crafts, P., Model-based calculation of
Ruether, F.; Sadowski, G., Modeline the Solubility of Pharmaceuticals in Pure Solvents
Martin, T. M.; Young, D. M., Prediction of the acute toxicity (96-h LC50) of organic
Hukkerikar, A. S.; Kalakul, S.; Sarup, B.; Young, D. M.; Sin, G.; Gani, R., Estimation of
47 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
of Group-Contribution(+) (GC(+)) Property Models and Uncertainty Analysis. J. Chem. Inf.
2
Modeling 2012, 52, 2823.
3
82.
4
S., Ionic Liquids and Relative Process Design. Mol. Thermo. Complex Systems 2009, 131, 143.
5
83.
6
pairs. New J. Chem. 2011, 35, 1740.
7
84.
8
optimal solvent selection. Fluid Phase Equilib. 1993, 82, 47.
9
85.
Page 50 of 68
Zhang, S.; Lu, X.; Zhang, Y.; Zhou, Q.; Sun, J.; Han, L.; Yue, G.; Liu, X.; Cheng, W.; Li,
Abraham, M. H.; Acree, W. E., Hydrogen bond descriptors and other properties of ion
Odele, O.; Macchietto, S., Computer-aided molecular design - A novel method for
Eljack, F. T.; Eden, M. R.; Kazantzi, V.; Qin, X.; El-Halwagi, M. A., Simultaneous
10
process and molecular design - A property based approach. AICHE J. 2007, 53, 1232.
11
86.
12
with target properties. Ind. Engg. Chem. Res. 1996, 35, 627.
13
87.
14
ed.; McGraw-Hill: 2001.
15
88.
16
integer nonlinear programs. Mathematical Programming 1986, 36, 307.
17
89.
18
fermentation. Fluid Phase Equilib. 2002, 201, 1.
19
90.
20
by global optimization. Comp. Chem. Engg. 1999, 23, 1381.
21
91.
22
algorithm for molecular design. Comp. Chem. Engg. 2003, 27, 551.
23
92.
24
analysis. Ind. Engg. Chem. Res. 2003, 42, 516.
25
93.
26
Global Optim. 2009, 45, 3.
27
94.
28
integrated extractive fermentation-separation process. Chem. Engg. J. 2010, 162, 809.
29
95.
30
Comp. Chem. Engg. 2009, 33, 2055.
31
96.
32
315.
Vaidyanathan, R.; El-Halwagi, M. M., Computer-aided synthesis of polymers and blends
Edgar, F. T.; Himmelblau, D. M.; Lasdon, L. S., Optimization of chemical processes. 2nd
Duran, M. A.; Grossmann, I. E., An outer-approximation algorithm for a class of mixed-
Wang, Y. P.; Achenie, L. E. K., Computer aided solvent design for extractive
Sinha, M.; Achenie, L. E. K.; Ostrovsky, G. M., Environmentally benign solvent design
Ostrovsky, G. M.; Achenie, L. E. K.; Sinha, M., A reduced dimension branch-and-bound
Sinha, M.; Achenie, L. E. K.; Gani, R., Blanket wash solvent blend design using interval
Floudas, C. A.; Gounaris, C. E., A review of recent advances in global optimization. J.
Cheng, H. C.; Wang, F. S., Computer-aided biocompatible solvent design for an
Sahinidis, N. V., Optimization techniques in molecular structure and function elucidation.
Fouskakis, D.; Draper, D., Stochastic optimization: a review. Int. Statist. Rev. 2002, 70,
48 ACS Paragon Plus Environment
Page 51 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
97.
Venkatasubramanian, V.; Chan, K.; Caruthers, J. M., Computer-aided molecular design
2
using genetic algorithms. Comp. Chem. Engg. 1994, 18, 833.
3
98.
4
Chem. Res. 2000, 39, 1423.
5
99.
6
property estimation. Ind. Engg. Chem. Res. 2004, 43, 3419.
7
100.
8
Tabu search. Comp. Chem. Engg. 2005, 29, 337.
9
101.
van Dyk, B.; Nieuwoudt, I., Design of solvents for extractive distillation. Ind. Engg.
Lehmann, A.; Maranas, C. D., Molecular design using quantum chemical calculations for
Lin, B.; Chavali, S.; Camarda, K.; Miller, D. C., Computer-aided molecular design using
Wu, L. L.; Chang, W. X.; Guan, G. F., Extractants design based on an improved genetic
10
algorithm. Ind. Engg. Chem. Res. 2007, 46, 1254.
11
102.
12
solvents for separation processes. Chem. Engg. Tech. 2008, 31, 177.
13
103.
14
Quantum Calculations For Selecting Extractants. 20th European Symposium on Computer Aided
15
Process Engineering – ESCAPE20 2010.
16
104.
17
Application to stochastic solvent selection. Ind. Engg. Chem. Res. 2002, 41, 1285.
18
105.
19
optimization under uncertainty. Part II. Solvent selection under uncertainty. Ind. Engg. Chem.
20
Res. 2005, 44, 7138.
21
106.
22
recycling under uncertainty using a new genetic algorithm. Int. J. Environ. Pollution 2007, 29,
23
70.
24
107.
25
Environ. Policy 2011, 13, 227.
26
108.
27
design. Comp. Chem. Engg. 2002, 26, 1415.
28
109.
29
reactions: Maximizing product formation. Ind. Engg. Chem. Res. 2008, 47, 5190.
30
110.
31
Chem. 1997, 16, 293.
Song, J.; Song, H. H., Computer-aided molecular design of environmentally friendly
Serrato, B. J. C.; Gómez, P. J.; Caicedo, A. L. M., Sequential Evolutionary Design-
Kim, K. J.; Diwekar, U. M., Efficient combinatorial optimization under uncertainty. 2.
Xu, W.; Diwekar, U. M., Improved genetic algorithms for deterministic optimization and
Xu, W. Y.; Diwekar, U. M., Multi-objective integrated solvent selection and solvent
Diwekar, U.; Shastri, Y., Design for environment: a state-of-the-art review. Clean Techn.
Wang, Y. P.; Achenie, L. E. K., A hybrid global optimization approach for solvent
Folic, M.; Adjiman, C. S.; Pistikopoulos, E. N., Computer-aided solvent design for
Barwick, V. J., Strategies for solvent selection - A literature review. Trac-Trends in Anal.
49 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
111.
2
algorithm for branched molecules. Ind. Engg. Chem. Res. 2004, 43 784.
3
112.
4
process design and analysis. Chem. Engg. Sci. 2010, 65, 5757.
5
113.
6
Chem. Engg. Prog. 1999, 95, 41.
7
114.
8
in different solvents. Ind. Engg. Chem. Res. 2003, 42, 5622.
9
115.
Page 52 of 68
Cismondi, M.; Brignole, E. A., Molecular design of solvents: An efficient search
Gernaey, K. V.; Gani, R., A model-based systems approach to pharmaceutical product-
Frank, T. C.; Downey, J. R.; Gupta, S. K., Quickly screen solvents for organic solids.
Abildskov, J.; O'Connell, J. P., Predicting the solubilities of complex chemicals I. Solutes
Savova, M.; Kolusheva, T.; Stourza, A.; Seikova, I., The use of group contribution
10
method for predicting the solubility of seed polyphenols of Vitis Vinifera L. with a wide polarity
11
range in solvent mixtures. J. Univ. Chem. Tech. Metallurgy 2007, 42, 295.
12
116.
13
R., An experimental verification of morphology of ibuprofen crystals from CAMD designed
14
solvent. Chem. Engg. Sci. 2007, 62, 3276.
15
117.
16
polymorphic pharmaceuticals and fine chemicals. J. Pharm. Sci. 2005, 94, 1560.
17
118.
18
models for prediction of ibuprofen crystal morphology based on hydrogen bonding propensities.
19
Fluid Phase Equilib. 2009, 277, 73.
20
119.
21
solution. AICHE J. 2000, 46, 1348.
22
120.
23
diclofenac in different solvents. Fluid Phase Equilib. 2007, 261, 140.
24
121.
25
base: Solubility relations, supersaturation control and polymorphic behavior. J. Phys. Chem. B
26
2005, 109, 5273.
27
122.
28
multidimensional crystallization processes. Comp. Chem. Engg. 2002, 26, 1103.
29
123.
30
monocyclic aromatic hydrocarbons. J. Am. Chem. Soc. 1949, 71, 1362.
Karunanithi, A. T.; Acquah, C.; Achenie, L. E. K.; Sithambaram, S.; Suib, S. L.; Gani,
Mirmehrabi, M.; Rohani, S., An approach to solvent screening for crystallization of
Acquah, C.; Karunanithi, A. T.; Cagnetta, M.; Achenie, L. E. K.; Suib, S. L., Linear
Winn, D.; Doherty, M. F., Modeling crystal shapes of organic materials grown from
Zilnik, L. F.; Jazbinsek, A.; Hvala, A.; Vrecer, F.; Klamt, A., Solubility of sodium
Jones, H. P.; Davey, R. J.; Cox, B. G., Crystallization of a salt of a weak organic acid and
Ma, D. L.; Tafti, D. K.; Braatz, R. D., Optimal control and simulation of
Birch, S. F.; Dean, R. A.; Fidler, F. A.; Lowry, R. A., The preparation of the c(10)
50 ACS Paragon Plus Environment
Page 53 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
124.
Wang, Q.; Ma, P. S.; Wang, C.; Xia, S. Q., Position Group Contribution Method for
2
Predicting the Normal Boiling Point of Organic Compounds. Chin. J. Chem. Engg. 2009, 17,
3
254.
4
125.
5
predicting pure component properties of biochemical and safety interest. Ind. Engg. Chem. Res.
6
2004, 43, 6253.
7
126.
8
critical pressure of organic compounds. J. Chem. Engg. Data 2008, 53, 1877.
9
127.
Stefanis, E.; Constantinou, L.; Panayiotou, C., A group-contribution method for
Wang, Q.; Jia, Q.; Ma, P., Position group contribution method for the prediction of
Sheldon, T. J.; Adjiman, C. S.; Cordiner, J. L., Pure component properties from group
10
contribution: Hydrogen-bond basicity, hydrogen-bond acidity, Hildebrand solubility parameter,
11
macroscopic surface tension, dipole moment, refractive index and dielectric constant. Fluid
12
Phase Equilib. 2005, 231, 27.
13
128.
14
Compounds at Their Normal Boiling Point with the Positional Distributive Contribution Method.
15
J. Chem. Engg. Data 2010, 55, 5614.
16
129.
17
the estimation of ionic liquid properties. Fluid Phase Equilib. 2010, 297, 107.
18
130.
19
Sadhana (Sd) index of phenylenes and its hexagonal squeezes for QSAR studies. J. Ind. Chem.
20
Soc. 2010, 87, 1449.
21
131.
22
F., Estimating the Octanol/Water Partition Coefficient for Aliphatic Organic Compounds Using
23
Semi-Empirical Electrotopological Index. Int. J. Mol. Sci. 2011, 12,7250.
24
132.
25
organic molecules to tasks of chemical informatics. Russ. Chem. Bul. 2005, 54, 2235.
26
133.
27
distance-connectivity-based topological indices. 4: Stepwise factor selection-based PCR models
28
for QSPR study of 14 properties of monoalkenes. Pol. J. Chem. 2007, 81, 269.
29
134.
30
Sum Connectivity Indices: Novel Highly Discriminating Topological Descriptors for
31
QSAR/QSPR. Chem. Biol. Drug Des. 2012, 79, 38.
Jia, Q. Z.; Wang, Q. A.; Ma, P. S., Prediction of the Enthalpy of Vaporization of Organic
Valderrama, J. O.; Rojas, R. E., Mass connectivity index, a new molecular parameter for
Aziz, S.; John, P. E.; Khadikar, P. V., Use of structure codes (counts) for computing
Souza, E. S.; Zaramello, L.; Kuhnen, C. A.; Junkes, B. D.; Yunes, R. A.; Heinzen, V. E.
Trofimov, M. I.; Smolenskii, E. A., Application of the electronegativity indices of
Shamsipur, M.; Hemmateenejad, B.; Ghavami, R.; Sharghi, H., Highly correlating
Gupta, M.; Gupta, S.; Dureja, H.; Madan, A. K., Superaugmented Eccentric Distance
51 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
135.
2
empirical topological index: a tool for QSPR/QSAR studies. J. Mol. Modeling 2005, 11, 128.
3
136.
4
J. Mol. Struct. 2003, 621, 37.
5
137.
6
property correlations. Theochem. J. Mol. Struct. 2002, 586, 137.
7
138.
Li, X. H., The extended Wiener index. Chem. Phys. Lett. 2002, 365, 135.
8
139.
Patel, S. J.; Ng, D.; Mannan, M. S., QSPR Flash Point Prediction of Solvents Using
9
Topological Indices for Application in Computer Aided Molecular Design. Ind. Engg. Chem.
Page 54 of 68
Junkes, B. D.; Arruda, A. C. S.; Yunes, R. A.; Porto, L. C.; Heinzen, V. E. F., Semi-
Torrens, F., Valence topological charge-transfer indices for dipole moments. Theochem.
Ren, B. Y., Application of novel atom-type AI topological indices in the structure-
10
Res. 2009, 48, 7378.
11
140.
12
group-contribution method. Int. J. Thermophysics 2008, 29, 568.
13
141.
14
design of solvents for separation processes. AICHE J. 1994, 40, 1349.
15
142.
16
systematic optimisation approach. Part II. Solvent design. Chem. Engg. Sci. 2000, 55, 2547.
17
143.
18
using molecular clustering. Chem. Engg. Sci. 2006, 61, 6316.
19
144.
20
Combined property clustering and GC(+) techniques for process and product design. Comp.
21
Chem. Engg. 2010, 34, 582.
Stefanis, E.; Panayiotou, C., Prediction of Hansen solubility parameters with a new
Pretel, E. J.; Lopez, P. A.; Bottini, S. B.; Brignole, E. A., Computer-aided molecular
Marcoulaki, E. C.; Kokossis, A. C., On the development of novel chemicals using a
Papadopoulos, A. I.; Linke, P., Efficient integration of optimal solvent and process design
Chemmangattuvalappil, N. G.; Solvason, C. C.; Bommareddy, S.; Eden, M. R.,
22
23
24
52 ACS Paragon Plus Environment
Page 55 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
Appendix
2
Illustrations of normal melting point estimation using various structure property predictions
3
methods. The experimentally determined melting point for 2,5, dimethyl benzoic acid reported in
4
the literature123 is around 405.15 K.
Estimation of normal melting point of 2,5, dimethyl benzoic acid 2,5, dimethyl benzoic acid
Molecular structure
CAS No. 611-72-0 Molecular formula : C9H10O2 Molecular weight: 150.177
Group contribution method
5
Groups
Occurrences(Ni)
Contribution (Tmi)
CH3
2
-5.1
COOH
1
155.5
=CH- (ring)
3
8.13
=C< (ring)
3
37.02
T m = 1 2 2 .5 +
∑
N iT m i
i
6
Tm = 122.5 - 5.1 × 2 + 155.5 × 1 + 8.13 × 3 + 37.02 × 3 = 403.25 K
Constantinou-Gani method First order groups
Occurrences (Ni)
Contribution (Tm1i)
aC-CH3
2
1.8635
aCH
3
1.4669
aC
1
0.2098
COOH
1
11.563
53 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 56 of 68
Second order groups
Occurrences(Mj)
Contribution(Tm2j)
aCCOOH
1
28.4324
COH
1
0.3189
1
Tmo = 102.425;
2
T exp m = ∑ NiTm1i +∑ M jTm 2 j j Tmo i
3
exp (Tm/102.425) = (1.8635 × 2 + 1.4669 × 3 + 0.2098 × 1 +11.563 × 1 ) +
4
(28.4324 × 1 + 0.3189 × 1)
5
Tm = ln(48.6518) ×102.425 = 397.88 K
Marrero-Gani method
6
First order groups
Occurrences (Ni)
Contribution (Tm1i)
aC-COOH
1
12.4296
aC-CH3
2
1.0068
aCH
3
0.5860
Second order groups
Occurrences(Mj)
Contribution(Tm2j)
C-OH
1
0.3695
* No third order groups are involved
7
Tmo = 147.45;
8
T exp m = ∑ NiTm1i + ∑ M jTm 2 j +∑ Ok Tm3k j k Tmo i
9
exp(Tm/147.45) = (12.4296 × 1 + 1.0068 × 2 + 0.5860 × 3) + (0.3695 × 1) = 16.5707
10
Tm =
ln(16.5707) ×147.45 = 413.985 K
11
54 ACS Paragon Plus Environment
Page 57 of 68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Position group contribution method Group
Occurrences (N)
Contribution(A)
CO –(CH)(O)
1
34.246
Cb-(H)
3
6.224
Cb- ( C )
2
-331.51
C-(Cb)(H)3
2
49.96
O-(CO)(H)
1
369.423
Cb-(COOH)
1
1181.043
Group
Occurrences (Pk)
Contribution (Ak)
Ortho correction
1
0.777
Meta correction
2
-7.374
1
Tmo = 5963.486, N=10, a1 = -5758.997, a2=51.127;
2
N + a exp 1 N Tm = Tmo + ∑ Ai N i + ∑ Aj tanh j + ∑ Ak Pk + a1 exp 1 ( ) 2 N M w k i j
3
Tm = 5963.486 + 49.96×2 + 34.246 × tanh(1/10) + 6.224 × tanh(3/10) – 331.51 × tanh(2/10) +
4
369.423 × tanh(1/10) + 1181.043 × tanh(1/10) + 1 × 0.777 – 2 × 7.374 – 5758.997 ×
5
exp(1/150.177) + 51.127 × exp(1/10) = 402.79 K
6 7 8 9
55 ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 58 of 68
Connectivity index method
Group
δ
δv
Occurrence
-CH3
1
1
2
=CH-
2
3
3
=C