A Novel Methodology for Property-Based Molecular Design Using

Jan 9, 2013 - The inverse problem of identifying molecules from a set of property ...... small number of additional parameters in the final problem fo...
5 downloads 0 Views 980KB Size
Article pubs.acs.org/IECR

A Novel Methodology for Property-Based Molecular Design Using Multiple Topological Indices Nishanth G. Chemmangattuvalappil†,‡ and Mario R. Eden*,‡ †

Department of Chemical and Environmental Engineering, University of Nottingham Malaysia, Semenyih, Selangor, Malaysia Department of Chemical Engineering, Auburn University, Auburn, Alabama, United States



ABSTRACT: In this work, we are introducing an algorithm that can be used for solving molecular design problems of reactive systems on a property-based platform. The property clustering technique is extended to identify a set of target properties for the components in the system that provides the optimum process performance. Once the property targets have been identified, a molecular design problem can be formulated to identify the potential candidate molecules that meet the targets identified in the previous problem. The molecular design involves the identification of potential molecules possible from the specific types of reactions in the process. To design molecules, a recently introduced concept known as molecular signature descriptors has been used. The molecular signatures can be tailored to track the changes in molecular groups in a molecule resulting from different types of chemical reactions. The changes in the chemical structure can be correlated with the changes in the properties of the molecule. Therefore, the changes in the molecular structure due to reactions can be represented as a function of the property. The developed algorithm applies different qualitative structure activity/property relationships (QSAR/QSPR) to estimate properties from the molecular structure. QSAR relationships make use of different topological indices and it has been shown that a number of topological indices of molecules can be represented in terms of molecular signatures and that it is possible to correlate the topological indices to the actual properties and biological activities. Here, the new algorithm utilizes molecular property operators on the basis of signatures for solving the inverse problem of obtaining the molecular structures that satisfy the property targets estimated in the process design step. A new set of equations are employed to ensure that the molecule meets the safety and environmental constraints as well. The principles of graph theory are incorporated to track the signatures and to avoid the generation of infeasible molecular structures.

1. INTRODUCTION The inverse problem of identifying molecules from a set of property constraints is a relatively new problem from a chemical engineering perspective. Current methodologies are very specific to certain classes of property models and even in such restricted situations, the accuracy and computational expenses need improvement. Most of the existing methods use group contribution-based approaches for solving the inverse design problem.1 A number of recent publications have used the group contribution model (GCM)-based techniques to solve for different classes of molecular design problems.2−5 However, the suitability of group contribution methods as a tool for molecular design is limited in its applications because of the following reasons: (1) it is not always possible to find a suitable correlation between the molecular groups and the desired properties; (2) not all possible atomic arrangements are represented in GCM; (3) many group contribution models have limited ranges of accuracy. The chemical structure can provide a lot of information about the properties of the molecule. The information available from a structural formula includes the total number of atoms, the number of each type of atoms, and the bonding between the atoms. These sets of information enable the structure to be represented in a graphical form.6 The molecular graphs are generally represented as hydrogen-suppressed graphs where only the molecular skeletons without hydrogen atoms (except for heteroatoms) are used. Double and triple bonds are also not shown in the hydrogen-suppressed graph. The presence of hydrogen atoms and multiple bonds are handled in the general © 2013 American Chemical Society

formulation of molecular indices. The difference between normal molecular structures and hydrogen-suppressed graphs is shown in Figure 1. The first figure is the molecular structure of acetone

Figure 1. Example of hydrogen suppressed graphs.

and its hydrogen suppressed graph representation is shown in the second figure. In the hydrogen-suppressed graph, the numbers 1, 2, and 4 represent the carbon atoms without hydrogen atoms on it and 3 represents the oxygen atom. The edges a, b, and c represent the bonds connecting these atoms. Even though the double bond and single bonds are represented identically, the bond indices are defined to take care of that difference. The representation of a molecule in the form of a graph can provide a great amount of information about its properties through Special Issue: PSE-2012 Received: Revised: Accepted: Published: 7090

September 25, 2012 January 5, 2013 January 9, 2013 January 9, 2013 dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

a large number of structures. However, because of templating, the generation of nonintuitive solutions may be eliminated. In some recent works, heuristic methods have been used in order to handle the nonlinear constraints.19−21 These methods cannot ensure a global optimum solution. However, in order to ensure that the solution is near optimum, Tabu search methods have been employed while solving the problem. In Tabu search, a library will be generated that keeps track of the obtained local optima solutions to prevent the generation of the same local minima solution as the search proceeds.20 Even though this technique ensures better quality of the solutions, the optimum solution can never be guaranteed because of the nonlinear constraints. Apart from these limitations, the above-mentioned techniques can be used only when the QSAR/QSPRs are based on one topological index. For many properties, the QSAR/QSPRs are formed based on more than one topological index. In addition, the topological indices required to form the QSAR/QSPRs may be different for different properties of interest. For instance, the QSPR for one property target may be based on connectivity indices and the second property target may be based on shape indices. Since different topological indices are formulated using different mathematical expressions, there is no standard way to combine everything on a common platform and solve simultaneously. Therefore, the current techniques available that make use of QSAR/QSPR models in reverse problem formulations can handle property models with one topological index. Different topological indices have to be formulated using different mathematical transformations. In addition, the degeneracy of solutions in all these methodologies is very high. That means, for a specific solution, there could be many possible molecular structures. The degeneracy increases with the size of the molecules in the solution. In this work, we are looking for a computationally efficient algorithm for molecular design that can incorporate different QSAR/QSPRs simultaneously. A recently developed molecular descriptor called signature descriptors22−25 has some convenient features that can be used for this purpose. A variety of TIs can be represented as linear combinations of molecular signatures. In an inverse design, the property targets can be satisfied using different property models that use different classes of topological indices. The signature descriptors can form a common platform to represent these different property models. Therefore, this approach has the potential to develop an efficient methodology for the design of molecules with QSARs/QSPRs related to diverse TIs.

the use of molecular descriptors. Molecular descriptors are operators developed from the molecular graph to characterize the properties of the molecule. The numerical value obtained after performing the operations suggested by the descriptor on the molecular graph can generally be used to correlate and predict physical properties and biological activities.7 There are numerous molecular descriptors available and this makes it difficult to select the appropriate one(s) for a specific problem. A lot of work has been done in molecular design using Topological Indices (TI) as structural descriptors.8−12 In all these approaches, the descriptors’ structural features have been used to generate the feasible molecular structures. However, the inverse relationships between the topological indices generally do not provide a unique molecular graph.13 Therefore, the degeneracy in these approaches is very large. In addition, most topological indices exhibit highly nonlinear functional dependence on the elements of the vertex-adjacency matrix.14 Therefore it is difficult to obtain a global solution while employing mathematical programming techniques. In more recent works,14,15 methodologies were developed to incorporate topological indices within an optimization framework. In the work by Raman and Maranas,14 many hydrocarbon properties are correlated with connectivity indices and shape indices of different orders. According to Raman and Maranas,14 the molecular structure was represented in the form of a vertex adjacency matrix that can completely explain the molecular interconnectivity. Even though the actual topological indices used in this work are nonlinear, the matrix representation has been used to systematically transform them into linear form. A mixed integer linear program (MILP) formulation has been formed which ensures that a global optimum solution can be achieved. However, its application has been limited to the design of alkyl structures. In the work by Camarda and Maranas,15 nonlinearities due to the expressions for connectivity indices led to MINLP formulations, which make the solution methodology computationally expensive and susceptible to local optima traps. Another important inverse problem solution technique was developed by forming a target scaffold for drug design.16 This is the first attempt to apply mathematical programming techniques in the area of drug design. A target scaffold based on a drug molecule was used to generate a QSAR, and the inverse problem was solved for the best values of selectivity by changing the substituents on the scaffold.16 The limitation for this approach was that it is not capable to provide nonintuitive solutions because the scaffold limits the type of molecules obtained as solutions. However, this approach was effective in controlling the combinatorial explosion. In a later contribution, a fitness function was directly incorporated into an optimization framework.17 However, in many formulations, a global optimum solution cannot be guaranteed using this approach. In another work, second order connectivity indices were used for the design of value added soybean oil products.18 The interesting aspect of the above work is that the highly nonlinear second-order connectivity indices weren represented with the exact linear equivalent expressions using Glover transformation. By applying the Glover transformation, the nonconvex terms were converted into products of binary variables. Even though this approach eliminates the possibility of local minima traps by avoiding the nonconvex terms, the computational requirements are relatively high. To decrease the computational complexity, an approach known as “templating” has been applied in this work. In templating, a portion of the vertex adjacency matrix is predefined to control the generation of

2. THEORETICAL BACKGROUND 2.1. Reverse Problem Formulation and Property Operators. Simultaneous consideration of process requirements and molecular design aspects can lead to computational difficulties. The reverse problem formulation concept has been developed based on the understanding that the complexity of such problems is a result of the nonlinearity of the property models.26,27 A logical way to reduce the complexity of such a system of equations is to apply decomposition techniques. The basic concept behind the reverse problem formulation is to decompose an integrated process and molecular design problem into two independent subproblems that are linked through a set of property targets. The first subproblem is the reverse of a simulation problem. Here, the property targets of the input molecules that give the optimum process performance will be identified. The second subproblem is the reverse of a property prediction problem. Here, the molecules that meet the property targets identified during 7091

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

the first reverse problem will be identified. Since both reverse problems aim to match the same set of property targets, the accuracy of the solution will not be compromised.26 One of the main challenges in developing a property-based approach to solve integrated process and product design problems is that, unlike mass or energy, properties are not conserved. However, instead of tracking properties directly, a set of surrogate properties known as property operators can be tracked using linear mixing rules.28 For a mixture made up of Ns streams and described by j properties, the property operator, Ψj(PjM) corresponding to the property P is formulated as shown in eq 1: Ng

ψj(PjM ) =

∑ s=1

Fs N

∑s =s 1 Fs

atom x. In general, when the tree has been constructed up to level h − n, then, layer h − n + 1 will be constructed considering each vertex of layer h − n. All vertices in the tree are labeled and colored with the necessary coloring function. The vertex color of a graph is the assignment of a unique description provided to its vertices in order to distinguish among different groups of atoms. The coloring function will be selected on the basis of the type of molecule. Typical coloring functions include the type of atoms, type of bonds, and valency. It is possible to have one vertex appear more than once in the graph. However, no edge should be repeated in the same graph. (4) The signature can be written by reading the tree from the atom x. The child level vertices will be enclosed in parentheses at each level. The vertex color must be written along with the vertices in each level. After each level, the next level must be followed until the signature reaches the required height. While writing the signatures, all the neighbors including the root atom must be written. The above procedure has been summarized in Figure.2.

Ng

ψj(Pjs) =

∑ xsψj(Pjs) s=1

(1)

Here, Ψj(Pjs) is the operator of the jth property Pjs of stream s, and Fs is the fractional contribution to the total flow rate of stream s. In the process system, the properties may be of different units and magnitudes. To make the properties dimensionless and of similar magnitude, the property operators are divided by appropriately chosen reference operators. The normalized property operator thus obtained is defined as shown in eq 2: Ωjs =

ψj(Pjs) ψj(P jref )

(2)

Pjref

where is the reference value. Most molecular design algorithms make use of the group contribution methods.2 Much work has been done recently to design molecules with a set of target properties using alternative computationally efficient algorithms.2−5,16 2.2. Development of Molecular Signature. The concept of molecular signature22−24 is useful for reverse problem formulations because they form a finite set of not highly correlated descriptors based on the molecular structure from which many other topological indices can be calculated. The molecular signature is a systematic way of representing the atoms in a molecule using the extended valencies to a predefined height. The systematic procedure for the identification of the signature of a molecule as developed by Visco and Faulon22−24 is explained in this section. One of the characteristics that make the molecular signature descriptors unique among other molecular descriptors is that the building blocks of the molecular signatures complement each other. If G is a molecular graph and x is an atom of G, the atomic signature of height h of x is a canonical representation of the subgraph of G containing all atoms that are at a distance h from x. The height of the signature is the level to which the neighborhood information is sought. In other words, it is the diameter of the subgraph generated from the atomic signature. The canonical representation can be achieved by the following systematic procedure. (1) All atoms (vertices) in the graph are labeled in a canonical order starting with atom x. (2) For atom x, for which the atomic signature is to be constructed, all atoms and bonds will be shown up to the height h in the sub graph hG(x). (3) Construct the tree that spans over all the edges in the sub graph. The root of the tree is the atom x itself. The tree is constructed one layer at a time up to level h. The first layer of the tree is the nearest neighbors of atom x, the second layer of the tree consists of all neighbors of the vertices in layer 1 except the

Figure 2. Generation of atomic signatures.

An example of the construction of atomic signature is shown in Figure 3. Here, the stepwise procedure for obtaining the atomic signature of atom N up to height 3 in a molecule is illustrated. In the first step, the atoms are labeled to distinguish them from each other while writing the molecular signatures in the later stage. In the second step, all the atoms at height 3 from atom N are extracted. In other words, the neighbors of height three includes the atoms bonded to N (say y), the atoms bonded to all atoms in y (say z) and all the atoms bonded to all atoms in z. In the third step, the molecular groups are replaced with vertices. All the vertices are colored with the atom type, valency and the type of bond. Here the atoms types are C and N. The different carbon types are distinguished by their valencies and the type of bond. Note that the bond type has been retained in the canonical representation in step 2. In the final step, the signatures of different heights have been formed by reading the tree starting from the root N atom. Atomic signature of height zero is the root N atom itself. Atomic signature of height one is the root atom 7092

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

Figure 3. Atomic signature up to height 3.

3. USING MOLECULAR SIGNATURES IN PROPERTY-BASED MOLECULAR DESIGN 3.1. Property Operators and Connectivity Rules. The introduction of molecular signature descriptors enables the representation of a number of topological index-based models on a common platform. Since topological indices are widely employed to form a number of QSAR/QSPR relationships that track a variety of property targets, molecular signatures have the potential to expand the applicability of reverse problem formulation techniques. For this purpose, we exploit an analogy that exists between the definition of property operators and molecular signatures. Similar to property operators, the signature descriptors can be formed as building blocks to a larger unit. The nonlinearity involved in the original definition of the topological index will not appear while solving the inverse problem because, similar to property operators, the unknown parameters in the inverse problem will be the contributions of individual building blocks. While using molecular signatures, the mathematical expressions that contribute to the nonlinearity have been decoupled from the rest of the equations, thus the complexity of the system of equations will be reduced significantly. A property model can be represented using eq 4, where θ is a function of the target property, which can have a nonlinear structure:

followed by the nearest neighbors (in this case, two carbon atoms) enclosed in parentheses. In signatures of height two, the neighbors of the carbon atoms (including the root N atom) are listed in the parentheses. Finally, signature of height three is obtained by adding the neighbors of the atoms in the previous layer. While writing the signatures, the vertices at the different levels have been color coded for clarification of the levels. It is clear that the set of atomic signatures up to a given height is of finite size. So, any molecule can be represented by its coordinates in a vectorial space where the base vectors are its atomic signatures. Thus, the signature of a molecule is defined as the linear combination of atomic signatures.23 If hσG(hXi) is a base vector, hαi is the number of atoms having the signature of the base vector and hKG is the number of base vectors, then the molecular signature hσ(G) is represented as shown in eq 3: h

h

σ (G ) =

∑ x ∈ VG

h

KG

σ G(x) =

∑ hα ihσ G(hX i) i=1

(3)

Signature descriptors have attributes that make them attractive as a tool for reverse problem formulations. Even though they can represent independent building blocks for a complete molecule, they also are related to the rest of the building blocks. This allow us to define a number of topological indices from the signature base.

θ = f (TI) 7093

(4) dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article n1

In addition to the property functions, the topological indices will also contribute to the nonlinearity of the problem. However, there exists a linear relationship between a number of topological indices and molecular signatures.9 The topological indices can be calculated from the molecular signatures using the relationship developed by Faulon et al.:23 TI(G) = khαG· TI(root( h∑ ))

i=1

n1

n2

n3

(9)

i=1

Here, n1, n2, n3, and n4 are the numbers of signature xi with valency one, two, three, and four, respectively. N is the total number of signatures in the molecule and R is the number of cycles in the structure. Equations 7−9 are applicable only in graphs without multiple edges. However, the constituent signatures may have multiple bonds if the design involves the formulation of molecules with multiple bonds. Therefore, eq 9 has been modified to apply for all types of molecules according to eq 10:

(5)

n1

n2

n3

n4

∑ xi + 2 ∑ xi + 3 ∑ xi + 4 ∑ xi i=1

n1

⎡⎡ N 1 = 2⎢⎢∑ xi + ⎢⎣⎢⎣ i = 1 2

n2

n3

NDi

NMi

NTi





i=0

i=0

i=1

⎥⎦

⎥⎦

∑ xi + ∑ xi + ∑ xi⎥ − 1 + R ⎥ (10)

Where NDi, NMi, and NTi are the signatures with one double bond, two double bonds, and one triple bond respectively in the parent level. For every additional bond on the signature root level, the number of edges increases by 0.5 because each bond will be shared by two signatures. As mentioned earlier, when signatures are used as the building blocks for molecular design, no potential signature in the solution set can be considered as completely independent of the rest of the solutions. The number of bonds in each signature should match with the bonds in the other signatures. Therefore, constraints must be formed to ensure that the bonds at each signature height must match to some other signature so that there will be a path connecting all the vertices. To differentiate between different types of atoms and bonds, graph coloring has been used. The consistency of signatures is ensured because when molecular graphs are formed from the signatures, they form a digraph. This means that when molecular graphs are formed from signature building blocks, in addition to the individual constituents, the directions of the building blocks are also significant. In a digraph, the out degree of one vertex is defined as the number of arcs formed from the vertex. For a vertex v, the out degree of v is denoted as p⃗(v). Similarly, the in-degree of vertex v is the number of arcs joining to the vertex. To ensure the consistency of signatures that form a potential solution, a property of digraphs known as the handshaking dilemma30 has been applied. According to the handshaking dilemma, the sum of the indegrees of all the vertices of the digraph will be equal to the sum of their out-degrees:30

N

(6)

While an inverse problem can be formulated based on the signature building blocks, the information about the number of bonds will not be available prior to the solution. Therefore, a relationship between the number of vertices and edges has been used to apply the handshaking lemma for inverse design. Equation 7 can be applied for a simple graph with R circuits: (7)

ρ ⃗ (v ) = ρ ⃖ (v )

Equation 6 can now be represented in terms of the vertices using eq 8:

(11)

When a molecule is formed from signatures, the in-degrees and out-degrees are based on the different types of bonds between the atoms. Since each edge (bond) is shared by two vertices (atoms), the colors of the edge that joins the two vertices must be the same for both the vertices in a complete molecule. However, the order of the colors will be different for both vertices since the reading of the color has to start from the root atom. If (li→lj)h is one coloring sequence li→lj at a level h, then, the following

N

∑ D(i) = 2(V − 1 + R) i=1

n4

= 2[(∑ xi) − 1 + R ]

∑ D(i) = 2M

V=M+1−R

n3

N

In eq 5, TI(G) is the topological index value of the molecule, k is a constant specific to the topological index, hαG is the vector of the occurrence number of atomic signature of height h and TI(root (hΣ)) is the topological index calculated for each of the atomic signature building blocks. The h required in eq 5 depends on the topological index to be calculated. From eq 5, it is clear that the topological indices of the entire molecular structure can be obtained from the individual building blocks if the atomic signatures are available. To obtain a potential solution in an inverse problem, apart from satisfying the property constraints, it is also to be ensured that the signatures will connect together to form a complete molecular graph. Here, there is a significant difference from the structural constraints used in group contribution based algorithms.2−5 In group contribution based approaches, the connectivity among the building blocks can be ensured by avoiding free bonds in the final molecular structure. However, molecular signature based algorithms must ensure that the signatures can be connected together without free bonds and also must be consistent with the rest of the signatures in the solution set. This is because, even though the building blocks are defined independently, all building blocks are related to one another in terms of neighborhood. The structural constraints involved in solving a molecular design problem using signature building blocks have been presented in our previous paper.29 To ensure that the signatures in a potential solution will connect together to form a complete molecule without any free bonds, the handshaking lemma in graph theory can be applied. This rule relates the degrees (valancies of the building blocks) of the individual building blocks to the edges (bonds) in a graph. The mathematical representation of the handshaking lemma is shown in eq 6, where D is the number of degrees and M is the number of edges:

i=1

n2

∑ xi + 2 ∑ xi + 3 ∑ xi + 4 ∑ xi

(8)

Consider the design of molecules without multiple edges. Assume the maximum valency of the hydrogen-suppressed atoms involved is limited to four; therefore, for a collection of molecular signatures to form a complete molecule, eq 9 must be obeyed: 7094

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

height will be able to describe the atoms, bonds, and nearest neighbors. This transformation will be useful when the property models of some of the target properties are available in GC and some of the models are available as QSAR/QSPR. If some of the properties of interest are available in GC, rewriting those models in the form of signatures will allow us to solve the property models based on TIs along with the GC models. The height and number of the signature used to write the molecular groups in the GC model depends on the number of atoms used for the molecular design and the nature of final molecule. Different coloring functions can be used to identify the signatures corresponding to equivalent groups in GC models. For example, in the design of alkane molecules, the possible first order groups are

equation must be satisfied for the existence of a complete molecule:

∑ (li → lj)h = ∑ (lj → li)h

(12)

Equation 12 has to be obeyed for all color sequences and at each height. This rule will be applied even for the color sequence in which i = j. That means, if the color sequence in one signature is li→lj with i = j, then, there must be another signature present in the set of signatures with the same color sequence to complement the previous one. If there is more than one color sequence li→lj on one signature with i = j, then, all of them must be complemented with the same color sequence in other signatures in the set. This can be mathematically represented as follows:

C

∑ ηixi = 2K (13)

i=j

C3(CCC) C2(CC) C1(C)

Now, the color at the root level can be used to distinguish between different molecular groups. For example, the color of 3 at the root level of a signature represents the presence of a CH group and the property contribution for the CH group can be assigned to the signature with root vertex color 3. This rule can be applied to all other signatures and even in designs with more signature building blocks. The most important characteristic of the signature descriptors while using it to represent group contribution models is its ability to account for the contributions of higher order molecular groups. As discussed in the previous sections, higher order group effects are due to the proximity of various first order groups. Therefore, similar to the first order groups, second order groups can also be considered as a tree with D < h − 1. The procedure to track the second order contributions is as follows: In the first step, the signatures are generated only based on first order groups, without considering any second order group contributions. Then, among the generated signatures, those signatures that carry the second order group contribution are identified. While property contribution to that signature is assigned, apart from the contribution of the actual molecular group, the contribution of the second order group is assigned as well. It can be seen that there are no specific signatures for any first order/higher order group. On the basis of the available groups and the nature of the final molecular structure, the designer can identify the corresponding signature to each molecular group. The following examples are taken from the problem presented later in this paper as a case study. The first order molecular groups involved are CH3, CH2, CH, CH3CO, and CHCH. After generating the signatures of height, three corresponding to the first order molecular groups, the following signatures are identified to carry the second order contributions corresponding to the second order groups shown on the right-hand side.

(14)

∑ NC i i

(15)

i

Here, f(X) is a function of the actual property X, Ci is the contribution of the molecular group i that occurs Ni times. To increase the prediction accuracy, higher levels of molecular groups have been identified.31 The higher order groups represent the effects on the properties due to the interaction among different molecular groups. Three levels of molecular groups have been identified which are termed as first order, second order, and third order groups. The general expression for the property functions in GC is31 f (Y ) = (∑ NC i i) + (∑ NsCs) + (∑ Nt Ct ) i

s

t

CH3

C4(CCCC)

Where, ni is the number of child vertex with a higher degree than the parent vertex. Here, i and j represent the child and parent colors. 3.2. Expression of Group Contribution Models with Signatures. A variety of properties can be accurately predicted from molecular structures using the group contribution models (GC). In GC, the property function of a compound is estimated as the summation of property contributions of all the molecular groups present in the molecular structure.31 f (X ) =

CH 2

If the building blocks in this design are signatures of height one, the following signatures form the basic building blocks:

Where, η is the number of color sequences li→lj on one signature with i = j and x is the number of such color sequences and K is an integer. To form a connected tree, it is also should be ensured that the total number of signatures where the degree of the vertex at a higher level is more than the degree at a lower level should be less than the total number of vertices with the higher degree. In some signatures, there will be more than one child with a specific color (say m) for a single parent. In such cases, it must be ensured that the number of complementary signatures with the previous parent in the child level must be more than m.

∑ xini ≤ ∑ xj

CH

(16)

Ni, Ns, and Nt are the number of first, second, and third order groups, respectively, and Ci,Cs, and Ct are their respective GC property contributions predicted by Marrero and Gani.31 Molecular signatures of sufficient height can be used to rewrite the group contribution expressions in terms of molecular signatures. This is because every molecular group can be considered as a tree with D < h, where h is the height of signature used to represent the molecular groups. Therefore, signatures of that

C4(O2(C)

C3(C3(CC) 7095

C1(C)

C3(CCC)) → CH3COCH

C1(C)) → CH3CHCH

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

For instance, the first signature has the root color C4, which indicates the carbon with valency four in a hydrogen-suppressed graph. The atoms in the first layer are oxygen with a double bond, a carbon with valency three and a carbon with valency one. With the available first order groups, this signature distinctly represents a CH3CO group, with a CH group in its proximity. Similar logic can be seen in the second example as well where the CHCH group is connected to a CH3 group. In general, it has been identified that the height of the signature needed to represent second order groups can be two or three. Therefore, it can be ensured that, the signature height required to identify the higher order effects is three. 3.3. Molecular Signatures with Different Heights. In a typical integrated process and molecular design problem, there will be a number of property targets to be satisfied corresponding to the optimum process performance. To ensure that the solutions obtained after solving the molecular design problems are accurate, it is important to apply the best property models corresponding to each target. Since different property models make use of different topological indices and group contribution models, the height of the molecular signature required to represent them will be different. Therefore, in order to apply the connectivity principles, it is necessary to represent all the property models with a unique signature height. In addition, it is important to have a methodology to make use of the highest signature height employed in the transformations of topological indices. This is because, as the height of the signature increases, the degeneracy of the solutions will decrease and the use of the highest signature height ensures a solution with minimum degeneracy.23 The formulation of property operators from signatures provides a convenient way to achieve this target. When property operators have been formed from topological indices and group contribution models in terms of signatures, the unknown variable will be the number of appearances of signatures. Based on the type of the original property model, the signature height will be different. However, from every signature with height h, the number of potential signatures with height h − n, where n < h, can be generated. Therefore, the number of appearances of all signatures with height h − n can be represented as the sum of all signatures with height h. For illustration, two of the property models used in the case study are analyzed. Two property targets are modeled using four different topological indices.32,33 v

to ensure minimum degeneracy. In this case, a signature of height three is selected. In the next step, all signatures of height three will be classified on the basis of the color of signatures at height two. For example, consider the following signature of height three: O1(C2(O1(C)C2(CC)))

The root of this signature at height two is O1(C2(OC)). It can be seen that a number of other signatures with height three can be generated from the root O1(C2(OC)). Therefore, the number of appearances of O1(C2(OC)) will be the sum of the number of appearances of all such signatures with height three. Therefore, all signatures with height three can be classified into separate sets based on the color at height two, and the number of height two signatures will be the sum of elements in the corresponding set. This principle can be applied into the representation of signatures of height one. For instance, the root signature of height one corresponding to O1(C2(OC)) is O1(C). Therefore, the number of appearances of O1(C) can be represented as the sum of all signatures whose root is O1(C). When this transformation is applied, eqs 17 and 18 can be rewritten exclusively on the basis of the number of appearances of signatures of height three, and this ensures the solution with minimum degeneracy. 3.4. Design of Cyclic Molecules. Molecular signatures can be formed for cyclic molecules as well. However, the connectivity rules will be different for molecules with rings in their structures according to eqs 7 and 8. It can be noted that the complete connectivity of the signatures depends upon the number of rings in the final structure. Therefore, the design of cyclic structures will be done in separate steps. The connectivity rules change according to the number of rings in the final structure. Therefore, while designing structures with multiple edges, the problem will be solved in multiple steps corresponding to the number of rings in the target solutions. Many topological indices are defined differently for cyclic structures. One example is connectivity indices of different orders. Therefore, it is necessary to use a different coloring function for an atom which should form part of a ring to distinguish that from an atom which is not a part of the ring even though both atoms can have the same neighbors. An example is shown in Figure 4. In both

v

log(Koc) = 0.53(1 χ ) − 1.25(Δ1 χ ) − 0.72(Δ0χ ) + 0.66 (17)

Vm = 33.52ε + 30.67

Figure 4. Labeled graphs.

(18)

structures shown in Figure 4, the atomic signature of height one is the same for the atom labeled as 2:

The topological indices used in this case study are transformed into their corresponding signature equivalents: 0

1

0

1

χ = αG· χ (root( Σ))

1

χ =

ε=

1 2 ( αG) ·1 χ (root(2Σ)) 2

1 3 ( α i) ·ε(root(3Σ)) 2

C3(CCC)

However, in the first structure, since the atom labeled as 2 is part of a ring, many topological indices like connectivity indices and group contribution values will be different from the atom labeled as 2 in the second structure. Therefore, the atoms that form a part of a ring will be colored with the letter r along with their usual color. So, the signatures corresponding to the first structure in Figure 4 will be rC(rCrCC).

(19)

(20)

(21)

4. ENUMERATION OF MOLECULAR STRUCTURES FROM SIGNATURES The generation of molecular graphs from molecular descriptors is one of the most challenging issues in inverse design. However,

In this problem, there are property models that use signatures of height one, two, and three. To apply the connectivity rules, all these signatures must be of the same height. According to the concepts discussed earlier, the highest signature must be selected 7096

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

it is possible to generate molecular graphs from a given set of signatures. The following procedure has been developed to generate the molecular structures from the signature building blocks: 1. Select any signature of height h randomly from the solution set. 2. Consider the signature starting from the first layer of the selected signature (the signature/signatures with height h − 1). This signature must form the first n − 1 layers of the signature attached next to the first signature. 3. Generate signatures of height h − 1 among the rest of the signatures. 4. Select the signature whose h − 1 height is the same as the signature starting from the first layer of the first signature. 5. If there is more than one signature that satisfy step 4, consider the last layer of the contesting signatures. The signature whose color in the last layer matches with the (n − 1)th layer of the first signature will then be selected for forming the bond. This is possible because when a bond is formed between two vertices, the same layer will appear in the second signature at the next level. 6. If there is more than one signature that satisfy step 5, it does not matter which signature is selected for forming the bond. This is because two signatures with the same height h will form isomorphic graphs. 7. After forming the first bond, repeat the same procedure for the other signatures starting at layer one. All matching signatures will form subsequent bonds to the root signature. 8. Repeat the same procedure in the newly formed levels until all the signatures in the solution set have appeared in the signature chain. 9. The signatures will be replaced with the root atoms in each signature. 10. The hydrogen atoms must be added to satisfy the valencies of all the atoms to complete the final molecular structure. The above procedure has been summarized in the flowchart given in Figure 5.

5. MOLECULAR SIGNATURES IN REACTIVE SYSTEMS To track the same set of target properties, the signature of a reaction, σ(R) can be defined as shown in eqs 22 and 23: h

σ (R ) = f (σ(A)σ(B)) h

ΔΩ f 2 = f ( σ (R ))

Figure 5. Structure enumeration from signatures.

(22)

atoms that are at a height 2 from the signature marked as “1”. In general, the difference in property contributions of different signatures can be represented using eq 25:

(23)

Where σ(A) and σ(B) are the signatures corresponding to the reactants and products. During a reaction, one or more of the molecular groups in the compound will be replaced by a new group(s). The change in groups results in changes in the molecular signatures. The height of the signature considered for tracking the target property will determine the signatures from the original molecular groups that will be repeated in the product as well. For instance, consider the esterification of alcohols with acetic acid:

ΔΩ = Δ[(∑ hα h − k)reactant − (∑ hα h − k)products ]

(25)

where, (hαh‑k)reactants is the sum of property operators for the signatures of the atoms that are at height “h − k” from the group that would be replaced, (hαh‑k)products is the sum of property operators for the signatures of the atoms that are at height h − k from the group that is formed after the reaction, and ΔΩ is the difference in property operator value as a result of the reaction. Since the atomic signatures of molecular building blocks that are farther from height h − k will be the same even after the reaction, the property operators will be same for the rest of the molecule. Now, the property operator corresponding to the molecule formed after a specific type of chemical reaction can be estimated:

According to eq 23, the signatures of atoms originating from the last two groups in “R” will be changed because of the esterification reaction. Therefore, the change in the property function will be represented as the changes in atomic signatures of the

Ω reaction = ΔΩ + Ω(P) 7097

(26)

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

where, Ωreaction is the property operator corresponding to the final molecule when the initial compound used has a property operator value of Ω(P). 5.1. General Problem Statement. Design the molecules with the best dominant property that satisfy a set of property constraints identified during the process design when the molecule is formed after a chemical reaction. 5.2. Problem Formulation. The property operator of a property P is a function, θ, of the actual property which obeys linear mixing rules:

θ = f (TI) N

TI =

• Estimate the property operators corresponding to the changes possible on the molecular groups because of the reaction. • Estimate the property operator values corresponding to potential types of reactions. • Based on the structural constraints and transformation techniques explained in section 3, identify the signatures and generate candidate molecules using the algorithm in section 4. The overall methodology to solve the process and product design problems now can be represented in its entirety as shown in Figure 6.

(27) N

∑ hα i·TI(root( h∑ )) = ∑ hα iLi i=1

i=1

6. CASE STUDY 1ACID GAS REMOVAL 6.1. Problem Statement. A gas treatment process uses methyl diethanol amine, MDEA (OH(CH2)4N(CH3)OH) to remove acid gases from an alkane-rich feed. Two recycled process streams (S1 and S2), which mainly consists of 2,5dimethyl hexane also constitute the feed and will be separated from the amine after the acid gas absorption. The property and flow rate data of both MDEA and the recycled streams are summarized in Table 1. The flowsheet of the process is given in Figure 7: The objective of this design problem is to identify a solvent that can be used to reduce the consumption of MDEA by 50% and utilize all available recycle streams. The solvent when mixed with MDEA and the recycle stream must satisfy a set of environmental regulations. The solvent must also possess the qualities of efficient acid gas removal agents. Therefore, the molecular building blocks have to be selected such that the final structure should be similar to known efficient acid gas removal agents. The sink performance requirements are functions of vapor pressure (VP), heat of vaporization (Hv) and molar volume (Vm). Apart from the process constraints, the designed molecule should have minimum soil sorption coefficient (log Koc) to avoid accumulation of the escaping solvent in one place and high toxic limit concentration (TLC) value. The energy index for the separation of alkane from the final molecule must be low so that the alkane can be easily separated after the absorption. 6.2. Process Design. The first step in this integrated process and molecular design problem is to identify the targets for the molecular design from the process design constraints. The property operators corresponding to the target properties are defined by the eqs 34−36:

(28)

N

Ω(P) =

∑ xiLi i=1

(Ω reaction)j = ΔΩ + Ω(P)

(29) (30)

This problem can be formulated as a bilevel optimization problem. The first level is to identify the right sets of atomic signatures that change after the reaction and the second level is to identify the right combination of those atomic signatures along with the atoms whose signatures would not change with the reaction. To solve the first level, the groups can be represented as a set of binary variables:

Φ=

∑ hα·ei h−k

(31)

where ei is the number of appearances of each signature that will change because of the reaction. Φ will be optimized to get the optimum value of Ωj. The components of ΔΩ can be estimated corresponding to the type of reaction. Since the difference is restricted to the atoms that are at height h − k from the group on which the reaction can take place, this modification will add only a small number of additional parameters in the final problem formulation. The dominant property, which is expressed in terms of the occurrences of atomic signatures, can be optimized subject to the property constraints. If Ωj is the property operator corresponding to the dominant property and Ωij is the normalized property operator of molecule i, an optimization problem can be formulated as follows:

max /min Ωj

(32)

Ωmin ≤ Ωj ≤ Ωmax j j

(33)

NS

ψVP =

∑ xs VP1.44 s=1

(34)

NS

The principles of graph theory are applied to ensure that the signatures connect together to form feasible molecules. To do that, a number of structural constraints that ensure the consistency of signatures are introduced in terms of the number of appearances of atomic signatures. 5.3. Stepwise Solution Procedure. • The property targets for the input molecules to provide the optimum process performance will be calculated using eq 1. • Identify the QSAR/QSPR/group contribution models that predict the properties corresponding to the optimum performance. • Identify heights of molecular signatures corresponding to the TIs used in QSPR.

ψV = m

∑ xsVm s=1

(35)

NS

ψH = v

∑ xsHv s=1

(36)

The property targets for molecular design have been identified from eqs 34−36. They are listed in Table 2: 6.3. Molecular Design. Apart from the process constraints, there are two environmental constraints to be considered during the design of the new compounds. The maximum value of toxic limit concentration (TLC) is kept as 10 ppm and the soil sorption coefficient needs to be minimized. Now, the next step is to find suitable property models to predict these properties from 7098

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

Figure 6. Integrated process-product design using signature descriptors.

Here, Tbp is the boiling point of the liquid and T is the temperature at which VP is measured, which is 323 K in this example. There are group contribution relationships available for the estimation of molar volume. However, a more accurate estimation technique is available for molar volume based on edge adjacency indices.33 The available relationship is given in eq 40:

Table 1. Property and Flow Rate Data for Acid Gas Removal Problem feed properties property

S1

S2

MDEA properties

lower bound for sink

upper bound for sink

VP (mm Hg) Hv (kJ/mol) Vm (cm3/mol) flow rate (kmol/h)

63.2 39 178 50

43.1 41 189 70

0.26 89 114 180

60 110 300

10 90 140 350

Vm = 33.52ε + 30.67

where ε is the edge adjacency index. The edge adjacency index (ε) is a topological index developed by Estrada.35 Two edges are adjacent if one vertex in a chemical graph is incident to both the edges. If there are m edges in a graph and gij are the elements in that graph, the edge adjacency matrix, E= [gij]mxm can be defined as shown in eq 41:

molecular structure using topological indices or group contribution models. For heat of vaporization, there is a reliable group contribution model available, which is given in eq 37:31 ΔH v = hv0 +

∑ nihvi

⎧1 if ei and ej are adjacent gij = ⎨ ⎩ 0 otherwise ⎪



(37)

δ(ek) =

ng g=1

Ns

(38)

ε=

Nt t=1

(42)

This can be calculated as the sum of elements of kth row in the matrix E. Now, the edge adjacency index can be calculated using eq 43:

∑ [δ(ei) δ(ej)k]−l 1/2 l

∑ nstb2 + ∑ nttb3] s=1

∑ gik i

1.7

Tb = tb0 ln[∑ nitb1 +

(41)

The edge degree δ(ek) has been defined in eq 42:

For vapor pressure, there are no group contribution relationships available. Nevertheless, there is an empirical relationship (eq 38) that can be used to calculate vapor pressure from the boiling point.34 For boiling point, there is a group contribution expression available:31 ⎛ Tbp ⎞ log VP = 5.58 − 2.7⎜ ⎟ ⎝T ⎠

(40)

(43)

Where the sum is over all l adjacent edges in the graph. Here, k is a constant, which is defined in eq 44:

(39) 7099

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

Figure 7. Acid gas removal flowsheet.

Now, the topological indices involved in this case study are the following:

Table 2. Property Targets for Molecular Design property VP (mm Hg) Hv (kJ/mol) Vm (cm3/mol)

LB

UB

57 40

15.8 157 224

⎧1 if ei and ej are adjacent k=⎨ ⎩ 0 otherwise

0

χ = 1α G·0χ (root(1Σ))

If the graph contains heteroatoms, it is necessary to account for the differences in bonds formed between heteroatoms and carbon from the rest of the bonds. In that case, the values of gik have been replaced with KC‑X, which is the value corresponding to the bond between carbon and the heteroatom.36 The KC‑X parameters are related to the resonance integral associated with the bond between the heteroatoms and the carbon atom.37 Different values have been reported in the literature for KC‑X.38 For toxic limit concentration, a QSAR model is available that makes use of the connectivity index of order two:39 v

1

v

p2



2

[D(i) D(j) D(k)]−1/2 (46)

Where, p2 is the number of all paths of length two in the molecular graph. D(i), D(j), and D(k) are valencies of vertices i, j, and k. For the soil sorption coefficient, a QSAR model is available (R = 0.973) that makes use of a variety of connectivity indices:39 v

1 2 ( α G) ·1 χ (root(2Σ)) 2

(50)

1 3 ( α i) ·ε(root(3Σ)) 2

χ =

(51)

1 3 2 ( α G) · χ (root(3Σ)) 2

(52)

Table 3. Property Operators and Targets property VP Hv Vm TLC log(Koc)

v

log(Koc) = 0.53(1 χ ) − 1.25(Δ1 χ ) − 0.72(Δ0χ ) + 0.66 (47)

Where, Δχ is known as the delta connectivity index. It can be calculated using eq 48: (Δχ ) = (χ )np − χ

(49)

As discussed in section 3.2, the group contribution models with second order group contributions can be represented in terms of signatures of height two or three. The highest signature height among topological indices is three as seen in eqs 49−52. Therefore, the maximum height of signatures required in this problem is three. Now, the property operators are formed corresponding to the target properties and the upper and lower limits are calculated. The values are given in Table 3.

(45)

i ,j,k=1

χ =

ε=

The connectivity index of order two can be obtained from eq 46: (2 χ ) =

v

The next step is to rewrite the topological-index-based expressions in terms of signature descriptors. Equations 49−52 will provide the representation of topological indices used in this case study and the signature descriptors: (44)

log(TLC) = 4.204 − 1.385(2 χ )

v

χ , 0χ , 1 χ , 1 χ , 2 χ , ε , GC models





v

0

(48)

Ωj

LB

6.75 exp(Tb/tb0) Hv - hv0 45.3 (Vm − 30.67)/33.52 0.28 (4.204 − log(TLC))/1.385 2.21 log(Koc) − 0.66 minimum

UB 145.3 5.75

In order to select the molecular building blocks, the structures of currently used amine absorbents have been analyzed. The following are common absorbents along with MDEA:

Where, χnp is the molecular connectivity index of any height for the nonpolar equivalent structure of the molecule. 7100

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

N3(C1(N3(CCC))C2(N3(CCC)C2(CC))C2(N3(CCC)C2(OC))) Solution 2: O1(C2(O1(C)C2(CC))) O1(C2(O1(C)C2(NC))) N2(C2(N2(CC)C2(OC))C2(N2(CC)C2(CC))) C2(C2(C2(CC)C2(OC)C2(N2(CC)C2(CC))) C2(C2(C2(CC)C2(NC)C2(O1(C)C2(CC))) C2(C2(C2(CC)C2(NC)N2(C2(NC)C2(NC))) C2(C2(O1(C)C2(NC)N2(C2(NC)C2(NC))) C2(O1(C2(OC))C2(C2(OC)C2(CC)) C2(O1(C2(OC))C2(C2(OC)N2(CC)) Solution 3: O1(C2(O1(C)C2(CC))) O1(C2(O1(C)C2(NC))) C1(N3(C1(N)C2(CN)C2(CN))) C2(O1(C2(OC))C2(C2(OC)C3(=CC)) C2(O1(C2(OC))C2(C2(OC)N3(CCC)) C2(N3(C1(N)C2(NC)C2(NC))C2(C2(NC)C3(=CC))) C2(N3(C1(N)C2(NC)C2(NC))C2(C2(NC)O1(C))) N3(C1(N3(CCC))C2(N3(CCC)C2(CC))C2(N3(CCC)C2(OC))) C2(C2(O1(C) C2(CC))C3(=C3(=CC)C2(CC))) C2(C2(N3(CCC) C2(CC))C3(=C3(=CC)C2(CC))) C3(=C3(=C3(=CC)C2(CC))C2(C3(=CC)C2(CO))) C3(=C3(=C3(=CC)C2(CC))C2(C3(=CC)C2(NC))) Solution 4: O1(C2(O1(C)C2(NC))) O1(C2(O1(C)C3(=CC))) O1(C2(O1(C)C2(NC))) C2(C2(C2(CC)C3(=CC)C2(N3(CCC)C2(CC))) C2(C2(C2(CC)C2(NC)C3(=C3(=CC)C2(CC))) C2(O1(C2(OC))C2(C2(OC)N3(CCC)) C2(O1(C2(OC))C3(C2(OC)C3(=CC)) C2(N3(C1(N)C2(NC)C2(NC))C2(C2(NC)C2(CC))) C2(N3(C1(N)C2(NC)C2(NC))C2(C2(NC)O1(C))) N3(C1(N3(CCC))C2(N3(CCC)C2(CC))C2(N3(CCC)C2(OC))) C3(=C3(=C3(=CC)C2(CC))C2(C3(=CC)O1(C))) C3(=C3(=C3(=CC)C2(OC))C2(C3(=CC)C2(CC))) Solution 5: O1(C2(O1(C)C2(CC))) N1(C2(N1(C)C2(CC))) C2(C2(C2(CC)C2(OC)C2(N1(C)C2(CC))) C2(C2(C2(CC)C2(NC)C2(O1(C)C2(CC))) C2(C2(C2(CC)C2(NC)N1(C2(NC))) C2(O1(C2(OC))C2(C2(OC)C2(CC)) The final molecular structures are as follows: OH-(CH2)2−N(CH3)−(CH2)3−OH OH-(CH2)4−NH−(CH2)2−OH OH-(CH2)2−N(CH3)−(CH2) 2−CHCH−(CH2)2− OH OH-(CH2)2−N(CH3)−(CH2)3−CHCH−CH2−OH OH-(CH2)4NH2

monoethanolamine NH2CH2CH2OH diethanolamine OHCH2CH2NHCH2CH2OH methyl diethanolamine OHCH2CH2N(CH3)CH2CH2OH diisopropylamine (CH3)2CHNHCH(CH3)2 Now, all the molecular groups present in these molecules have been selected as the molecular building blocks for the new absorbent. In addition, for the demonstration of the ability of the signature-based algorithm to handle the design of molecules with multiple bonds, one additional group has been included. Here is the list of all the molecular building blocks for this design: OH

CH3

CH 2

CH3N

CH 2NH 2

CH 2NH

CHCH

NHCH

Now signatures have been generated corresponding to these molecular groups. Only those signatures are selected that form structures similar to the existing amine absorbents. Now, a MILP problem can be formulated: min Ω KOC

(53)

Ω VP ≥ 6.75

(54)

Ω Hv ≥ 45.3

(55)

Ω Hv ≤ 145.3

(56)

ΩVm ≥ 0.28

(57)

ΩVm ≤ 5.75

(58)

Ω TLC ≥ 2.21

(59)

⎡⎛

∑ Dixi = 2⎢⎢⎜⎜∑ xi + ⎣⎝

i

∑ (li → lj)h

i

=

1 2

⎤ ⎞ xi⎟⎟ − 1⎥ ⎥⎦ doublebonds ⎠



∑ (lj → li)h

(60) (61)

∑ ηixi = 2K i=j

(62)

∑ ηixi < ∑ xj

(63)

N

Ωj s =

∑ Lixi

(64)

i=1 N

Ωj s =

N

∑ Cixi + M ∑ Sjxj i=1

j=1

(65)

Now, different solutions are obtained by placing integer cuts after each solution. The best five solutions in terms of signatures are shown below: Solution 1: O1(C2(O1(C)C2(CC))) O1(C2(O1(C)C2(NC))) C1(N3(C1(N)C2(CN)C2(CN))) C2(C2(C2(CC)O1(C)C2(N3(CCC)C2(CC))) C2(O1(C2(OC))C2(C2(OC)C2(NC)) C2(O1(C2(OC))C2(C2(OC)N3(CCC)) C2(N3(C1(N)C2(NC)C2(NC))C2(C2(NC)C2(OC))) C2(N3(C1(N)C2(NC)C2(NC))C2(C2(NC)O1(C)))

7. CASE STUDY 2ESTER PRODUCTION In this case study, the target is to identify an alcohol that produces an ester with a given set of properties. The property targets for the ester 7101

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research



are given in Table 4. The property models from the previous case study will be used here as well.

Article

AUTHOR INFORMATION

Corresponding Author

*Tel.: +1 334 844 2064. E-mail: [email protected].

Table 4. Property Constraints

Notes

property

upper bound

lower bound

boiling point, BP (°C) log(TLC) (ppm) log(Koc)

170

110 1

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors gratefully acknowledge the financial support provided by the Southeastern Regional Sun Grant Program, the US Department of Energy (Award No. DE-EE003115), and the US Department of Agriculture NIFA-AFRI (Award No. 2011-6700920077).

minimum

The molecular descriptors used in the property models are transformed into the corresponding molecular signatures. The maximum signature height appearing in this case study is 3. It can be seen that the following height two signatures (and the similar signatures from the C4 root) have the potential to form a different signature because of the reaction: O1(C2(OC)),O1(C3(OCC) O1(C4(OCCC)) C2(O1(C)C2(CC)) C2(O1(C)C3(CCC)) C3(O1(C)C1(C)C2(CC)) C3(O1(C)C2(CC)C2(CC)) C3(O1(C)C3(CCC)C2(CC)) C3(O1(C)C3(CCC)C3(CCC)) The ΔΩ was estimated for the height 3 signatures originating from these height 2 signatures. An optimization problem was formulated using eqs 27−33 along with relevant structural constraints described in section 3 and solved for the minimum value of the soil sorption coefficient. The formulation of the structural constraints is similar to the previous case study. The best three candidates along with their predicted property are given in Table 5:



Table 5. Final Molecular Structures molecule

BP (K)

log TLC (ppm)

log Koc (ppm)

(CH3)2CH(CH3)CHOH (CH3)2CH(CH2)CH2OH CH3CH2CH(CH3)CH2OH

114 125 125

1.762 1.707 1.609

1.892 1.921 1.941



8. CONCLUSIONS The newly introduced concept of molecular signature descriptors has been used as a tool for the molecular design part of the general reverse problem formulation framework. The ability of signatures to represent a wide variety of topological indices allowed the integrated framework to design molecules corresponding to a number of target properties. This is a significant improvement from current algorithms for molecular design as topological indices with different levels of structural information have been used simultaneously. Furthermore, the new molecular design algorithm can simultaneously consider group contribution models with higher order contributions along with different varieties of topological indices. A significant aspect of the new molecular design algorithm presented in this paper is that the algorithm is formulated to track changes in the properties that occur as the result of different types of reactions. The developed algorithm focuses on the changes in the reaction sites of the molecules to design the components involved. Therefore, the applicability of the concept of reverse problem formulations has been extended to a wider variety of problems. Future efforts will focus on extending the present algorithm to systems with multiple and/or side reactions. This is possible by representing the potential pathways as a collection of unconnected graphs. The challenge here would be to form the structural constraints that relate the unconnected graphs and to prevent the generation of infeasible pathways.

NOMENCLATURE D = number of degrees deg = degree GCM = group contribution method G(z) = molecular sub graph of atom z h = height of signature M = number of edges Ng = total number of first order molecular groups Ns = total number of streams Pjg = contribution of property j from group g TI = topological index V = vertex x = number of signatures ys = fractional contribution ψj (Pj) = molecular property operator of the jth property th ψref j (Pj) = reference operator for j property Ωj = normalized property operator for property j η = number of occurrences edges with same color at both ends in one signature σ = molecular signature α = number of each signature θ = property function REFERENCES

(1) Achenie, L. E. K.; Gani, R.; Venkatasubramanian, V. Computer Aided Molecular Design: Theory and Practice; Elsevier: Amsterdam, The Netherlands, 2003. (2) Sahinidis, N. V.; Tawarmalani, M.; Yu, M. Design of alternative refrigerants via global optimization. AIChE J. 2003, 49, 1761. (3) Eljack, F. T.; Eden, M. R.; Kazantzi, V.; Qin, X.; El-Halwagi, M. M. Simultaneous process and molecular designA property based approach. AIChE J. 2007, 53, 1232. (4) Chemmangattuvalappil, N. G.; Eljack, F. T.; Solvason, C. C.; Eden, M. R. A novel algorithm for molecular synthesis using enhanced property operators. Comput. Chem. Eng. 2009, 33, 636. (5) Achenie, L. E. K.; Sinha, M. The design of blanket wash solvents with environmental considerations. Adv. Environ. Res. 2004, 8, 213. (6) Kier, L. B.; Hall, L. H. Chemometrics Series, 9: Molecular Connectivity in Structure−Activity Analysis; John Wiley & Sons: New York, 1986. (7) Faulon, J.-L.; Visco, D. P., Jr.; Pophale, R. S. The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR Studies. J. Chem. Inf. Comput. Sci. 2003, 43, 707. (8) Kier, L. B.; Hall, L. H.; Frazer, J. W. Design of molecules from quantitative structure−activity relationship models. 1. Information transfer between path and vertex degree counts. J. Chem. Inf. Comput. Sci. 1993, 33, 143. (9) Baskin, I. I.; Gordeeva, E. V.; Devdariani, R. O.; Zefirov, N. S.; Palyulin, V. A.; Stankevich, M. I. Methodology of solution of the inverse problem for the structure-property relationship for the case of topological indices (ChemistryEnglish Trans). Dokl. Acad. Nauk SSSR 1990, 307, 217.

7102

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103

Industrial & Engineering Chemistry Research

Article

(10) Gordeeva, E. V.; Molchanova, M. S.; Zefirov, N. S. General methodology and computer program for the exhaustive restoring of chemical structures by molecular connectivity indexes. Solution of the inverse problem in QSAR/QSPR. Tetrahedron. Comput. Method 1990, 3, 389. (11) Kvasnicka, V.; Pospichal, J. Canonical indexing and constructive enumeration of molecular graphs. J. Chem. Inf. Comput. Sci. 1990, 30, 99. (12) Skvortsova, M. I.; Baskin, I. I.; Slovokhotova, O. L.; Palyulin, V. A.; Zefirov, N. S. Inverse problem in QSAR/QSPR studies for the case of topological indexes characterizing molecular shape (Kier indices). J. Chem. Inf. Comput. Sci. 1990, 33, 630. (13) Trinajstic, N. Chem. Graph Theory, 2nd ed.; CRC press: Boca Raton, FL, 1992. (14) Raman, V. S.; Maranas, C. D. Optimization in product design with properties correlated with topological indices. Comput. Chem. Eng. 1998, 22, 747. (15) Camarda, K. V.; Maranas, C. D. Optimization in polymer design using connectivity indices. Ind. Eng. Chem. Res. 1999, 38, 1884. (16) Garg, S.; Achenie, L. E. K. Mathematical programming assisted drug design for nonclassical antifolates. Biotechnol. Prog. 2001, 17, 412. (17) Siddhaye, S.; Camarda, K.; Southard, M.; Topp, E. Pharmaceutical product design using combinatorial optimization. Comput. Chem. Eng. 2004, 28, 425. (18) Camarda, K. V.; Sunderesan, P. An optimization approach to the design of value-added soybean oil products. Ind. Eng. Chem. Res. 2005, 44, 4361. (19) McLeese, S. E.; Eslick, J. C.; Hoffmann, N. J.; Scurto, A. M.; Camarda, K. V. Design of ionic liquids via computational molecular design. Comput. Chem. Eng. 2010, 34, 1476. (20) Lin, B.; Chavali, S.; Camarda, K.; Miller, D. C. Computer-aided molecular design using Tabu search. Comput. Chem. Eng. 2005, 29, 337. (21) Eslick, J. C.; Ye, Q.; Park, J.; Topp, E. M.; Spencer, P.; Camarda, K. V. A computational molecular design framework for crosslinked polymer networks. Comput. Chem. Eng. 2009, 33, 954. (22) Visco, D. P.; Pophale, R. S.; Rintoul, M. D.; Faulon, J.-L. Developing a methodology for an inverse quantitative structure−activity relationship using the signature molecular descriptor. J. Mol. Graphics Modell. 2002, 20, 429. (23) Faulon, J.-L.; Visco, D. P., Jr.; Pophale, R. S. The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 2003, 43, 707. (24) Faulon, J.-L.; Churchwell, C. J.; Visco, D. P., Jr. The signature molecular descriptor. 2. Enumerating molecules from their extended valence sequences. J. Chem. Inf. Comput. Sci. 2003, 43, 721. (25) Weis, D. C.; Faulon, J.-L.; LeBorne, R. C.; Visco, D. P., Jr. The signature molecular descriptor. 5. The design of hydrofluoroether foam blowing agents using inverse-QSAR. Ind. Eng. Chem. Res. 2005, 44, 8883. (26) Eden, M. R.; Jorgensen, S. B.; Gani, R.; El-Halwagi, M. M. A novel framework for simultaneous separation process and product design. Chem. Eng. Process. 2004, 43, 595. (27) Gani, R. Chemical product design: Challenges and opportunities. Comput. Chem. Eng. 2004, 28, 2441. (28) Shelley, M. D.; El-Halwagi, M. M. Component-less design of recovery and allocation systems: A functionality-based clustering approach. Comput. Chem. Eng. 2000, 24, 2081. (29) Chemmangattuvalappil, N. G.; Solvason, C. C.; Bommareddy, S.; Eden, M. R. Reverse problem formulation approach to molecular design using property operators based on signature descriptors. Comput. Chem. Eng. 2010, 34, 2062. (30) Wilson, R. J. Introduction to Graph Theory; Longman Scientific & Technical: Essex, England, 1986. (31) Marrero, J.; Gani, R. Group-contribution based estimation of pure component properties. Fluid Phase Equilib. 2001, 183−184, 183. (32) Bahnick, D. A.; Doucette, W. J. Use of molecular connectivity indexes to estimate soil sorption coefficients for organic chemicals. Chemosphere 1988, 17, 1703. (33) Dai, J.; Jin, L.; Wang, L. Prediction of molar volume of aliphatic compounds using edge adjacency index. Prog. Nat. Sci. 1998, 8, 754.

(34) Sinha, M.; Achenie, L. E. K.; Gani, R. Blanket wash solvent blend design using interval analysis. Ind. Eng. Chem. Res. 2003, 42, 516. (35) Estrada, E. Edge adjacency relationships and a novel topological index related to molecular volume. J. Chem. Inf. Comput. Sci. 1995, 35, 31. (36) Estrada, E. Edge adjacency relationships in molecular graphs containing heteroatoms: a new topological index related to molar volume. J. Chem. Inf. Comput. Sci. 1995, 35, 701. (37) Daudel, R.; Leferbvre, R.; Moser,C. Qunantum Chemistry, Methods and Applications; Interscience Publishers, Inc.: New York, 1959. (38) Ortiz, P. J.; Perez, C. S. Elementos de Mecanica Cuantica y Estructura Atomic; Universidad de La Habana: Havana, Cuba, 1982. (39) Koch, R. Molecular connectivity and acute toxicity of environmental pollutants. Chemosphere 1982, 11, 925.

7103

dx.doi.org/10.1021/ie302516v | Ind. Eng. Chem. Res. 2013, 52, 7090−7103