A New Data Structure for Computational Molecular Design with Atomic

A new molecular data structure and molecular structure operation algorithms are ... children molecules are generated from two parent molecules through...
1 downloads 0 Views 1MB Size
Subscriber access provided by Nottingham Trent University

Chemical Information

A New Data Structure for Computational Molecular Design with Atomic or Fragment Resolution Hsuan-Hao Hsu, Chen-Hsuan Huang, and Shiang-Tai Lin J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.9b00478 • Publication Date (Web): 08 Aug 2019 Downloaded from pubs.acs.org on August 11, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

A New Data Structure for Computational Molecular Design with Atomic or Fragment Resolution Hsuan-Hao Hsu, Chen-Hsuan Huang and Shiang-Tai Lin* Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan *to whom all correspondence should be addressed. E-mail: [email protected] Abstract A new molecular data structure and molecular structure operation algorithms are proposed for general purpose molecular design. The data structure allows for a variety of molecular operations for creating new molecules. Two types of molecular operations were developed, uni-molecular and bi-molecular operations. In uni-molecular operations, a child molecule can be created from a parent via addition of a functional group, deletion of a fragment, mutation of an atom, etc. In bi-molecular operations, children molecules are generated from two parent molecules through combination or crossover (hybridization). These molecular operations are essential for the creation and modification of molecules for the purpose of molecular design. The data structure is capable of representing linear, branched, multifunctional, and multivalent compounds. Algorithms are developed for deriving the molecular data structure of a molecule from its atomic coordinates and vice versa. We show that this new molecular data structure and the developed algorithms, referred to as MARS (Molecular Assembling and Representation Suite), allow one to generate a comprehensive library of new molecules via performing every possible molecular structure modification. 1. Introduction The discovery and development of new specialty chemicals for specific purposes (e.g., drugs for curing a disease) are rather challenging because of the huge combinatorial space1-4 of chemical structures to be explored and the requirements of satisfying multiple molecular property constraints. In the early days, the development relied mostly on chemical intuitions and trial-and-error experiments. With the advancement of our knowledge regarding the relationship between chemical structure and its properties and the computer technology, an efficient approach, known as the Computer-Aided Molecular Design (CAMD),5-8 has emerged to drastically improve the development of specialty chemicals. CAMD offers new possibilities to the discovery of new chemicals by utilizing computers to search for the candidate structures automatically. There are two basic elements in CAMD, one is the calculation of material properties from a given molecular structure (the forward problem) and the second is the search of molecular structure for a given specification of desired material properties (the reverse problem), as illustrated in Figure 1. The forward problem has its focus on determining the properties based on certain ways of molecular descriptions. The property determination methods can base on quantum-chemical calculations, molecular simulations, theoretical or semi-empirical models (such as quantitativestructure property relation, group contribution methods, etc.). The reverse problem is a mixed-integer nonlinear programming problem (MINLP) aiming at finding the correct molecular structures that ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Paragon Plus Environment

Page 2 of 26

Page 3 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

manipulation at the atomic scale. However, direct manipulation on SMILES string was sometimes inconvenient, especially for cyclic and aromatic structures. The use of atoms as building blocks not only requires additional rules to handle connectivity of atoms but also prone to produce chemically unstable structures. The LEA3D30 (Ligand Evolutionary Algorithm 3D, 2005) was later developed based on LEA to resolve some of these issues. The most significant change is that the structure manipulation is carried out based mainly on fragments, which prevents the search of unreasonable structures and boosts the design of large aromatic molecules. Though the combinatorial space is sacrificed, chemicals generated by LEA3D can more likely be synthesized in the lab. In additional to LEA and LEA3D, there are also many programs specialized for drug design, such as ADAPT35 (2001), BREED36 (2004), FLUX37-38 (2006), GANDI39 (2007, free for academic use), GROWMOL40 (1994), LEGEND41 (1993), LIGBUILDER42 (2000, free), LIGMERGE43 (2012), LUDI44-48 (1991, commercially available), PROLIGAND49-54 (1994), SPROUT55 (1992, commercially available), SYNOPSIS56 (2003, commercially available), TOPAS57 (2000), MEGA58 (2008, commercially available). Readers interested in are referred to some excellent reviews1, 8 for further details. Some of them share the same design goal but are different in the aspect of MDS. The MDS developed in the CAMD programs can be grossly categorized into three types: 1D SMILES representation, 2D graphical representation, and 3D coordinates representation. The SMILES representation (adopted by LEA, LEA3D, and FLUX) has the merit of the compact expression of readable molecular structures by character strings. For example, 1-Fluoropentane can be represented as “CCCCCF”. However, the SMILES representation loses 3D structural information. Also, the structure manipulations on SMILES could be inconvenient for complex structures. Such inconvenience can be improved by using 2D and 3D MDS because they contain more detailed information regarding atom connectivities. Graphical representation59 (adopted by ADAPT and MEGA) utilizes adjacency matrix (i.e. a connectivity table that records the connectivity of a substructure with adjacent substructures) and distance matrix (i.e. a table that records the minimum number of substructures which bridges two substructures) to specify the relative positions between substructures. Structure manipulation can be done based on those matrices through proper algorithms. However, graphical representation also loses 3D conformational information. If the molecular design problem is very sensitive to molecular conformation, 3D coordinates representation could be a proper MDS. The 3D coordinates representation directly records the atom types and their coordinates in a molecule, which has the least simplification of realistic molecular structures. Most programs for drugs and biomolecules design, such as BREED, LEGEND, GANDI, PROLIGAND, GROWMOL, LIGBUILDER, LUDI, SPROUT, and LIGMERGE, adopted 3D coordinates representation. In this article, we propose a new array based molecular data structure (MDS) for the design of specialty chemicals. The merit of this new MDS is the easiness of applying molecular operations to generate new chemical structures. 3D atomic coordinates representations can be generated from MDS based on the valance shell electron pair repulsion (VSEPR) theory.60 The structural manipulations ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 26

(i.e. crossover and mutation) can be performed with different resolutions (i.e. using atoms or fragments as the building blocks) to generate new chemical species. Besides, we develop algorithms featuring the capability of complicated structure manipulations (e.g. ring formation and opening) and the capability of finding all possible combinations between two fragments. By incorporating this tool with property estimation models and optimization algorithms, the computer-aided molecular design61 for organic compounds and materials by could be achieved. 2. Theory A new array-based molecular data structure (MDS) is developed to represent the atomic connectivity of molecules. This MDS consists of 5 integer arrays used to store atomic and molecular fragment information and can be easily manipulated to create new structures. As illustrated in Figure 2, six algorithms are developed for structure manipulations, including unimolecular operations (addition, subtraction, mutation, and ring formation) and bimolecular operations (crossover and combination). Furthermore, algorithms are developed to generate the 3-dimensional molecular structure (atomic coordinates) from the MDS and MDS from the 3D structure. We note that this MDS has be used earlier for finding the chemicals with desired hydrophobicity and hydrophilicity balance.61 Here we show all the detailed internal structure and improved the robustness and reliability of the molecular operation algorithms.

Figure 2. An overview of MARS, the molecular data structure and structure manipulation operation functions, developed in this work. 2.1

The Library of Base Elements The MDS uses base elements to represent the atoms and its hybridization in a molecule. The base

elements, as shown in Table 1, are the building blocks of a molecule. Each base element has 5 attributes: the name, ID, number of bonds the element can form, bond order of each bond, and charge. These attributes are needed for creating the molecular data structure (to be detailed in the next section) and all subsequent structural operations, such as crossover and mutation. Table 1. The attributes of base elements name

ID

number of bonds

ACS Paragon Plus Environment

bond order

charge

4

Page 5 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1

4

(1,1,1,1)

0

2

3

(2,1,1)

0

3

2

(3,1)

0

4

2

(2,2)

0

5

2

(1,1)

0

6

1

(1)

0

7

1

(2)

0

8

3

(1,1,1)

0

9

2

(2,1)

0

10

1

(3)

0

Fluorine (F)

11

1

(1)

0

Chlorine (Cl)

12

1

(1)

0

Bromine (Br)

13

1

(1)

0

Iodine (I)

14

1

(1)

0

15

4

(1,1,1,1)

1

16

3

(2,1,1)

1

17

4

(1,1,1,1)

1

18

3

(2,1,1)

1

19

2

(1,1)

0

20

1

(2)

0

21

3

(1,1,1)

0

22

2

(2,1)

0

23

1

(3)

0

Fluoride ion (F-)

24

0

-

-1

Chloride ion (Cl-)

25

0

-

-1

Bromide ion (Br-)

26

0

-

-1

Iodide ion (I-)

27

0

-

-1

Hydroxide ion (OH-)

28

0

-

-1

Carbon (C)

Oxygen (O)

Nitrogen (N)

Ammonium cation (N+) Phosphorus cation (P+) Sulfur (S)

Phosphorus (P)

2.2

The Molecular Data Structure to Represent a Molecule

The molecular data structure stores the connectivity between the base elements of a molecule. The data structure consists of 5 indices (see Table 2), each being an integer array of size N (the number of base elements contained in the molecule). The element index gives each base element in the molecule a unique id for indexing (from 1 to N). The element type of an element represents the ID of the base element as given in Table 1. The parent index of an element is the element index of the former connected element. The bond order of an element is the order of bond between the present element and its parent element. The cyclic index (if nonzero) indicates the two atoms that are connected to form a ring. For example, for a benzene molecule containing 6 base elements of sp2 ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 26

carbon (base element ID=2), the molecular data structure is given in the 3rd column of Table 2. Note that the parent index of the first element is zero, indicating that it is the first element in the data structure. The parent indices of subsequent elements are 1 to 5, indicating that the elements are connected one by one. The bond order index of the first element is always zero. The bond order indices of following elements alternate between 2 and 1 indicating the alternating single and double bonds in benzene. The nonzero cyclic index in the first and last element indicates the ring structure of benzene. In what follows, we show that such a molecular data structure is useful for performing molecular operations and can be easily converted from and to 3D atomic coordinates. Table 2. The molecular data structure (MDS) proposed in this work index name element index†

a number used to identify an element in the molecule

example: benzene (1,2,3,4,5,6)

element type†

ID of the element as given in Table 1

(2,2,2,2,2,2)

index†

Index of the former connected atom

(0,1,2,3,4,5)

bond order index†

bond order with the parent element

(0,2,1,2,1,2)

cyclic index†

store which two atoms are connected as a cyclic compound

(1,0,0,0,0,1)

parent

†Each

stored information

attribute is an integer array of size N where N is the number of base elements contained in the

molecule. 2.3 From 3D atomic coordinates to MDS (3d2mds()) To determine the MDS from the atomic coordinates, it is necessary to find the type of base element of each atom and their connectivity. The connectivity between atoms are determined based on the separation distances of all the atomic pairs, and those with a separation distance less than the covalent radius62-63 would be labeled as having a covalent bond. Table S1 shows the connectivity determined for an ethene molecule. The value of 1 indicates that the corresponding two atoms in the top row and left column are connected with a valence bond (e.g. C1 and C2), 0 indicates no valence bond between the two atoms (e.g., C1 and H5). The connectivity table is a symmetric square matrix as can be seen in Table S1. The bond order of each valence bond can then be determined based on the number of elements connected to an element. For example, the two connected carbon atoms in ethane both have 3 connected atoms. Since these two carbon atoms do not satisfy the octet rule, they must be connected through a double bond. Such a check runs through every element in the molecule until the bond order of every bond is established, as in Table S2. Once the number of connectivities and bond order of each bond are determined, the type of base element for every atom can be determined according to Table 1. (It is noteworthy that such connectivity table can be prepared using Open Babel.64 One can input SMILES strings into Open Babel and convert them into MOLfile, which contains information regarding connectivities.) The first non-hydrogen atom (C1) would be assigned with parent index=0 ACS Paragon Plus Environment

6

Page 7 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

and element index=1. The second carbon atom (C2) would be assigned with element index=2 and parent index=1. The bond order index for the second carbon atom is 2. The resulting MDS of ethene is given in Table S3. Note that it is known that 3rd row elements may not follow the octet rule65 and therefore the present algorithm may overlook some of the possible chemicals involving the 3rd row elements. 2.4

From MDS to 3D Structure (mds23d())

The 3D molecular structure can be derived from the MDS by considering the geometry of the base elements (according to the hybridization of atomic orbitals) and the covalent radius62-63 of the connected bonds. Take ethane as an example. The first atom in the MDS is placed at the origin (0,0,0). Since it is an sp2 atom, we can assume the unit vectors of its three bonds to be (1,0,0), (-0.5, 0.866, 0), and (-0.5, -0.866, 0). The second atom can be placed along any one of the three possible bond vectors with a distance from the origin defined by the covalent radius, as illustrated in Figure 3. The preliminary 3D structure thus obtained are then refined by considering the valence interactions. In this work, we consider only the bond stretching and angle bending energies in the molecule. 1

Ebond = 2

(

)2

(1)

where Cbond (kJ/mol-Å2) is the constant for bond energies, and Req (Å) is the equilibrium bond length between two atoms. Eangle = (sin

)2

(cos

cos

)2

(2)

where Cangle (kJ/mol) is the constant for angle energies, and value is used for the energy constants: Cbond =500

is the equilibrium angle. A universal

(kJ/mol-Å2)

and Cangle =120 (kJ/mol), and the

equilibrium length comes from valence bond length and equilibrium angle is defined by VSEPR theory.60 The steepest descent method66 is used to optimize the molecular geometry by minimizing the total valence energy. As an illustration, the initial and refined 3D coordinates of benzene are given in Tables S4 and S5, respectively. The corresponding molecular geometries are shown in Figures S1 and S2. It should be noted that more accurate 3D geometry can be obtained using force field67 based optimization or quantum mechanical calculations.68 Interested readers can use software such as Open Babel64 for force field-based optimization and Gaussian 0969 for quantum mechanics-based optimization.

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 26

Figure 3. illustration of MDS to 3D structure for ethene. 2.5

Molecular Structure Manipulations

We developed six algorithms for manipulating the MDS and generating new chemical species. These algorithms include uni-molecular operations, generating new chemicals from an existing one, and bimolecular operations, generating new chemicals from two parent molecules. The uni-molecular operation consists of addition, subtraction, exchange, and ring formation. Bi-molecular operations include crossover and combination. Each of these molecular operations is detailed below. 2.5.1 Uni-molecular Operation – Mutation (add(), subtract(), exchange(), ring()) The mutation operation is to alter the chemical ingredient in a MDS. There are four possible mutation methods: addition, subtraction, exchange, and ring formation. The addition operation is to introduce a base element into the MDS. This operation is feasible for any element that contains an empty bond (no connection with non-hydrogen atom). Table 3 illustrates the addition of a hydroxyl group to tertbutylbenzene and the corresponding changes in the MDS. Table 3. Generate 2-tert-Butylphenol (b) from tert-butylbenzene (a) by addition operation MDS

(a)

(b)

2-D description

Element index

1 2 3 4 5 6 7 8 9 10

Element type

2222221111

Parent index

0123456777

Bond order index

-1 2 1 2 1 2 1 1 1 1

Cyclic flag

1000010000

Element index

1 2 3 4 5 6 7 8 9 10 11

Element type

22222211115

Parent index

01234567775

Bond order index

-1 2 1 2 1 2 1 1 1 1 1

Cyclic flag

10000100000

ACS Paragon Plus Environment

8

Page 9 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The subtraction operation is to remove a branch in a molecule. Table 4 shows the change in MDS of tert-butylbenzene when the tert-butyl group is deleted. Table 4. Generate benzene (b) from tert-butylbenzene (a) by subtraction operation MDS

(a)

(b)

2-D description

Element index

1 2 3 4 5 6 7 8 9 10

Element type

2222221111

Parent index

0123456777

Bond order index

-1 2 1 2 1 2 1 1 1 1

Cyclic flag

1000010000

Element index

123456

Element type

222222

Parent index

012345

Bond order index

-1 2 1 2 1 2

Cyclic flag

100001

The exchange operation is to replace an existing base element in the MDS with another one from the element library. There can be two types of exchange operations: changing the bond type or changing the atom type. Table 5 illustrates the replacement of one of double bonds in tert-butylbenzene with nitrogen and the corresponding change in MDS. Table 5. Generate 3-tert-butylpyridazine (b) from tert-butylbenzene (a) by exchange operation MDS

(a)

(b)

2-D description

Element index

1 2 3 4 5 6 7 8 9 10

Element type

2222221111

Parent index

0123456777

Bond order index

-1 2 1 2 1 2 1 1 1 1

Cyclic flag

1000010000

Element index

1 2 3 4 5 6 7 8 9 10

Element type

8822221112

Parent index

0 1 2 3 3 3 2 7 8 8 10

Bond order index -1 2 1 1 1 1 1 2 1 1 2 Cyclic flag

1000010000

The ring formation operation creates a cyclic structure by connecting two elements. Table 6 illustrates the change of the tert-butyl group on tert-butylbenzene to a methylcyclopropyl group.

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 26

Table 6. Generate (1-methylcyclopropyl)benzene (b) from tert-butylbenzene (a) by ring formation operation MDS

(a)

(b)

2-D description

Element index

1 2 3 4 5 6 7 8 9 10

Element type

2222221111

Parent index

0123456777

Bond order index

-1 2 1 2 1 2 1 1 1 1

Cyclic flag

1000010000

Element index

1 2 3 4 5 6 7 8 9 10

Element type

2222221111

Parent index

0123456777

Bond order index

-1 2 1 2 1 2 1 1 1 1

Cyclic flag

1000010202

2.5.2 Bi-molecular operations – hybridization (crossover(), combine()) The hybridization operation can be applied between the MSD of two molecules and generate new molecules. We developed two hybridization algorithms, crossover and combination. The crossover operation dissects each of two parent molecules into two segments and recombines them into two new molecules. The combination operation simply glues two parent molecules into a new larger compound. The crossover operation starts with selection of one element from each MDS. If the bond order of the selected atoms with its parent atoms are the same, then the crossover operation exchanges the branches of fragments after the two selected atoms. Table 7 illustrates the crossover of tertbutylbenzene and meta-amino phenol to generate aniline and 3-tert-butylphenol. Table 7. Generate aniline (c) and 3-tert-butylphenol (d) from tert-butylbenzene (a) and metaamino phenol (b) by crossover operation MDS

(a)

(b)

2-D description

Element index

1 2 3 4 5 6 7 8 9 10

Element type

2222221111

Parent index

0123456777

Bond order index

-1 2 1 2 1 2 1 1 1 1

Cyclic flag

1000010000

Element index

12345678

Element type

22722522

Parent index

01224557

Bond order index

-1 2 1 1 2 1 1 2

ACS Paragon Plus Environment

10

Page 11 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 26

3. Open Source Software for Utilization of the MDS The molecular data structure and molecular operation algorithms, MARS, developed in this work are available at GitHub (https://github.com/hsuhsuanhao/MARS). All the results presented in this work can be reproduced using the computer code provided. An instruction for installation and usage of the program is provided in the README file. A pseudocode for executing the examples presented in the Results and Discussion section is given in the supporting information. 4. Results and Discussion One outstanding feature of the proposed MDS is the transparency in generating all possible new chemicals for each molecular operation. In what follows, we use phenol and butane to demonstrate the creation of all possible new molecules for each of the 6 operations developed. 4.1

Uni-molecular operations

4.1.1 Addition

Figure 4. Illustration of the two possible unique sites for addition operation on butane Addition operation can be applied on every element with available bonds (i.e., bonds connected to hydrogen atoms). For example, addition can be applied on all four carbon atoms in butane; however, because of symmetry, there are only two possible unique sites (indicated (1) and (2) in Figure 4) for addition of another base element. In the present base element library, there are 11 elements available for single bond addition (C, O, N, F, Cl, Br, I, N+, P+, P, and S). Therefore, 22 molecules can be generated from the addition operation, as shown in Figure 5. A pseudocode for generating all possible molecules through addition is provided in PS1 of the Supporting Information.

ACS Paragon Plus Environment

12

Page 13 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 5. Twenty-two possible molecules generated from addition operation on butane. The 2D structures are generated using OpenBabel64 based on the SMILES output of the program developed here. 4.1.2 Subtraction Subtraction operation can be applied on every element whose parent index is nonzero, i.e., the size of molecule is larger than one non-hydrogen atom. Therefore, there are three possible dissecting sites ((1), (2), and (3) indicated in Figure 6) for subtraction operation on butane. The 3 compounds produced are listed in Figure 7. A pseudocode for generating all possible molecules through subtraction is provided in PS2 of the Supporting Information.

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 26

Figure 6. Illustration of the three possible unique sites for subtraction operation on butane

Figure 7. Three possible molecules generated from subtraction operation on butane 4.1.3 Exchange Exchange operation can be applied on any element resulting in the change of element type and/or bond order. In order to find all possible molecules that can be generated from the exchange operation, we perform the operation based on the bond order between two connected atoms. For example, without change of bond order of any of the three single bonds in butane, the number of possible base elements in the first, second, and third positions of the element type array are 9 (C, N, O ,S, P, F, Cl, Br, I), 5 (C, N, O, S, P, F, Cl, Br, I), and 5 (C, N, O, S, P, F, Cl, Br, I), respectively, (see Figure 8). The fourth position does not need to be considered because of symmetry. Therefore, the first bond (blue line in Figure 8) could have 44 (9*5-1) kinds of combinations (-1 is the subtraction of C-C bond). Similarly, for the exchange involving the second bond (red line in Figure 8), there are 6 ( 42) non-carbon exchanges. Furthermore, there are 4 additional exchanges (NN, OO, SS, PP) from the same atom combinations. As a result, there are a total of 54 (45-1+6+4) new molecules that can be generated from exchange operation without changing the bond order. (See Figure 9a).

Figure 8. Illustration of the possible sites for exchange operation on butane For exchange operation that involves the change of the single bond to double bond, the number of possible base elements in the first three positions in the element type array are 5 (C, N, O, S, P), 4 (C, N, S, P), and 4 (C, N, S, P). Therefore, there are a total of 30 (5*4+ 42+4 (C=C, N=N, S=S, P=P)) new molecules (See Figure 9b) from the exchange operation. Similarly, for exchange that results in ACS Paragon Plus Environment

14

Page 15 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

a triple bond, the possible base elements are 3 (C, N, P), 1 (C), and 1 (C) in the three positions, and the total number of new molecules that can be generated are 4 (3*1+1) (see Figure 9c). In summary, exchange operation on butane generates a total of 88 (54+30+4) new molecules. A pseudocode for generating all possible molecules through exchange is provided in PS3 of the Supporting Information.

(a)

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 26

(b)

(c) Figure 9. New molecules generated from exchange operation on butane: (a) without change the bond order (b) resulting in one double bond (c) resulting in one triple bond. (Note: the cross bonds in some of the molecules in (b) indicate existence of cis-trans isomers) 4.1.4 Ring formation

ACS Paragon Plus Environment

16

Page 17 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The ring operation can be performed on any two elements in the molecule with available bonds. In butane, there are two possible unique ways to form a cyclic structure, as indicated in Figure 10. The two new compounds from ring operation are shown in Figure 11. A pseudocode for generating all possible molecules through ring operation is provided in PS4 of the Supporting Information.

Figure 10. Illustration of two possible ring operations.

Figure 11. Two cyclic molecules generated from ring operation on butane. 4.2

Bi-molecular operation

4.2.1 Crossover

Figure 12. Illustration of unique dissection locations in phenol and butane for crossover operation. Crossover of two molecules can be done when they have bonds of the same bond order. Two children molecules are generated from a crossover operation of two parent molecules. A dissection point is selected from each of the parent molecules. For example, the unique dissection points in phenol and butane molecules are shown in Figure 12. Since there are only single bonds in butane, we only consider the three single bonds in phenol as possible dissection points. Each crossover operation produces two molecules. Therefore, the total number of possible molecules produced from crossover ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 26

operation is 18 (3*3*2), as shown in Figure 13. A pseudocode for generating all possible molecules through crossover operation is provided in PS5 of the Supporting Information.

Figure 13. Eighteen possible molecules generated from crossover operation on phenol and butane. 4.2.2 Combination Combination of two molecules can be made when there are available bonds from each of the parent molecules. For example, there are 4 unique sites on phenol and 2 unique sites on butane for the combination operation, as shown in Figure 14. A total of 8 (4*2) new molecules can be produced from connecting these sites of the two molecules (see Figure 15). A pseudocode for generating all possible molecules through combination operation is provided in PS6 of the Supporting Information.

ACS Paragon Plus Environment

18

Page 19 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 14. Illustration possible connection sites on phenol and butane for combination operation

Figure 15. Eight possible molecules generated from combination operation on phenol and butane 4.3

Computer-aided molecular design problem

The combination of the molecular operations developed here allows for the creation of nearly any possible molecular structures. As an illustration, we show how a benzene molecule can be created ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 26

from a methane molecule in Figure 16. In this case, sp3 and sp2 carbon atoms are added to the initial methane alternatively. Benzene can be obtained by ring operation on 1,3,5-Hexatriene. A pseudocode for these operations is provided in PS7 of the Supporting Information. It should be noted that there can be different pathways of generating benzene from methane. For example, one can also generate a cyclohexane and then transform 3 of the carbon-carbon single bonds to double bond. Therefore, the molecular operations are a very useful tool for creating new molecular species and can be very useful for computer-aided molecular design.

Figure 16. The creation of benzene from methane through a series of uni-molecular operations developed in this work. 5. Conclusions A new molecular data structure (MDS) is proposed for general purpose molecular design problems. The information of atom type, atom connectivity and bond order are represented using five integer arrays. Algorithms are developed for altering the chemical ingredients, the connectivity between atoms, and bond order of a molecule. Two types of molecular operations can be performed, mutation and hybridization. In mutation, new elements can be added, existing fragments can be removed, exchanged, and new bonds can be introduced between existing atoms of a molecule. In hybridization, new molecules can be created from two parent molecules through crossover and combination. Algorithms are also provided such that the 3D coordinates of atoms in the molecule can be obtained from the MDS. An outstanding advantage of the proposed MDS is the ease of generating new molecules through a thorough permutation of all possible operations. In particular, the molecular operations may follow real chemical synthesis pathways to generate new chemicals. The flexibility of the MDS and eligible operations makes it a powerful tool for molecular design problems. The chemical stability of the generated molecules has not been taken into account. However, different stability filters (thermal stability, air stability, photosensitivity, etc.) may be introduced to remove unstable chemical structures. Supporting Information Available: The data structure of several representative compounds and the pseudocode for reproducing the examples in this work are provided in Supporting Information. Acknowledgement ACS Paragon Plus Environment

20

Page 21 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

This research was partially supported by the Ministry of Science and Technology of Taiwan (MOST 107-2221-E-002-112-MY3) and National Taiwan University (NTU-CDP-108L7827). The computation resources from the National Center for High-Performance Computing of Taiwan and the Computing and Information Networking Center of the National Taiwan University are acknowledged. 6. References 1. Kutchukian, P. S.; Shakhnovich, E. I., De Novo Design: Balancing Novelty and Confined Chemical Space. Expert Opin. Drug Discovery 2010, 5 (8), 789-812. 2.

Ogata, K.; Isomura, T.; Yamashita, H.; Kubodera, H., A Quantitative Approach to the

Estimation of Chemical Space from a Given Geometry by the Combination of Atomic Species. QSAR Comb. Sci. 2007, 26 (5), 596-607. 3.

Blum, L. C.; Reymond, J.-L., 970 Million Druglike Small Molecules for Virtual Screening in the

Chemical Universe Database Gdb-13. J. Am. Chem. Soc. 2009, 131 (25), 8732-8733. 4.

Fink, T.; Reymond, J.-L., Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N,

O, > @ Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery. J. Chem. Inf. Model. 2007, 47 (2), 342-353. 5.

Achenie, L.; Venkatasubramanian, V.; Gani, R., Computer-Aided Molecular Design: Theory and

Practice. 1st ed ed.; Elsevier: Netherlands, 2003; Vol. 12. 6.

Doucet, J.-P.; Weber, J., Computer-Aided Molecular Design: Theory and Applications. 1st ed.;

Academic Press: 1996. 7.

Clark, D. E.; Westhead, D. R., Evolutionary Algorithms in Computer-Aided Molecular Design. J.

Comput.-Aided Mol. Des. 1996, 10 (4), 337-358. 8. Austin, N. D.; Sahinidis, N. V.; Trahan, D. W., Computer-Aided Molecular Design: An Introduction and Review of Tools, Applications, and Solution Techniques. Chem. Eng. Res. Des. 2016, 116, 2-26. 9.

Goldberg, D. E., Genetic Algorithms in Search, Optimization, and Machine Learning. 1st ed.;

Addison-Wesley Longman Publishing Co.: 1989. 10. Pham, D. T.; Karaboga, D., Intelligent Optimisation Techniques - Genetic Algorithms, Tabu Search, Simulated Annealing and Neural Networks. 1st ed.; Springer, London: 2000. 11. Zhang, J.; Qin, L.; Peng, D.; Zhou, T.; Cheng, H.; Chen, L.; Qi, Z., Cosmo-Descriptor Based Computer-Aided Ionic Liquid Design for Separation Processes: Part Ii: Task-Specific Design for Extraction Processes. Chem. Eng. Sci. 2017, 162, 364-374. 12. Cardoso, M. E.; Salcedo, R. L.; Azevedo, S. F. d.; Barbosa, D., A Simulated Annealing Approach to the Solution of Minlp Problems. Comput. Chem. Eng. 1997, 21, 1349-1364. 13. Ourique, J. E.; Silva Telles, A., Computer-Aided Molecular Design with Simulated Annealing and Molecular Graphs. Comput. Chem. Eng. 1998, 22 (Supplement 1), S615-S618. 14. Liu, B.; Wen, Y.; Zhang, X., Development of Camd Based on the Hybrid Gene Algorithm and Simulated Annealing Algorithm and the Application on Solvent Selection Can. J. Chem. Eng. 2017, 95 (4), 767-774. ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 26

15. Diwekar, U. M.; Gebreslassie, B. H., Efficient Ant Colony Optimization (Eaco) Algorithm for Deterministic Optimization. Int. J. Swarm Intel. Evol. Comput. 2016, 5 (131). 16. Gebreslassie, B. H.; Diwekar, U. M., Efficient Ant Colony Optimization for Computer Aided Molecular Design: Case Study Solvent Selection Problem. Comput. Chem. Eng. 2015, 78, 1-9. 17. Lin, B.; Chavali, S.; Camarda, K.; Miller, D. C., Computer-Aided Molecular Design Using Tabu Search. Comput. Chem. Eng. 2005, 29 (2), 337-347. 18. McLeese, S. E.; Eslick, J. C.; Hoffmann, N. J.; Scurto, A. M.; Camarda, K. V., Design of Ionic Liquids Via Computational Molecular Design. Comput. Chem. Eng. 2010, 34 (9), 1476-1480. 19. Viswanathan, J.; Grossmann, I. E., A Combined Penalty Function and Outer-Approaximation Method for Minlp Optimization. Comput. Chem. Eng. 1990, 14 (7), 769-782. 20. Gopinath, S.; Jackson, G.; Galindo, A.; Adjiman, C. S., Outer Approximation Algorithm with Physical Domain Reduction for Computer-Aided Molecular and Separation Process Design. AIChE J. 2016, 62 (9), 3484-3504. 21. Wang, Y.; Achenie, L. E. K., Computer Aided Solvent Design for Extractive Fermentation. Fluid Phase Equilib. 2002, 201 (1), 1-18. 22. Ryoo, H. S.; Sahinidis, N. V., A Branch-and-Reduce Approach to Global Optimization. J. Global Optim. 1996, 8 (2), 107-138. 23. Sahinidis, N. V.; Tawarmalani, M.; Yu, M., Design of Alternative Refrigerants Via Global Optimization. AIChE J. 2003, 49 (7), 1761-1775. 24. Samudra, A. P.; Sahinidis, N. V., Optimization-Based Framework for Computer-Aided Molecular Design. AIChE J. 2013, 59 (10), 3686-3701. 25. Harper, P. M.; Gani, R.; Kolar, P.; Ishikawa, T., Computer-Aided Molecular Design with Combined Molecular Modeling and Group Contribution. Fluid Phase Equilib. 1999, 158-160, 337347. 26. Zhang, L.; Cignitti, S.; Gani, R., Generic Mathematical Programming Formulation and Solution for Computer-Aided Molecular Design. Comput. Chem. Eng. 2015, 78, 79-84. 27. Karunanithi, A. T.; Achenie, L. E. K.; Gani, R., A Computer-Aided Molecular Design Framework for Crystallization Solvent Design. Chem. Eng. Sci. 2006, 61 (4), 1247-1260. 28. Sundaram, A.; Venkatasubramanian, V., Parametric Sensitivity and Search-Space Characterization Studies of Genetic Algorithms for Computer-Aided Polymer Design. J. Chem. Inf. Comput. Sci. 1998, 38 (6), 1177-1191. 29. Laeeq, S.; Sirbaiya, A. K.; Siddiqui, H. H.; Zaidi, S. M. H., An Overview of the Computer Aided Drug Designing. World J. Pharm. Pharm. Sci. 2014, 3 (5), 963-994. 30. Douguet, D.; Munier-Lehmann, H.; Labesse, G.; Pochet, S., Lea3d: A Computer-Aided Ligand Design for Structure-Based Drug Design. J. Med. Chem. 2005, 48 (7), 2457-2468. 31. Kamphausen, S.; Höltge, N.; Wirsching, F.; Morys-Wortmann, C.; Riester, D.; Goetz, R.; Thürk, M.; Schwienhorst, A., Genetic Algorithm for the Design of Molecules with Desired Properties. J. Comput.-Aided Mol. Des. 2002, 16 (8), 551-567.

ACS Paragon Plus Environment

22

Page 23 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

32. Douguet, D.; Thoreau, E.; Grassy, G., A Genetic Algorithm for the Automated Generation of Small Organic Molecules: Drug Design Using an Evolutionary Algorithm. J. Comput.-Aided Mol. Des. 2000, 14, 449. 33. Devillers, J., Genetic Algorithms in Molecular Modeling. Elsevier Science & Technology Books: 1996. 34. Weininger, D., Smiles, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Model. 1998, 28(1), 31-36. 35. Pegg, S. C.-H.; Haresco, J. J.; Kuntz, I. D., A Genetic Algorithm for Structure-Based De Novo Design. J. Comput.-Aided Mol. Des. 2001, 15 (10), 911-933. 36. Pierce, A. C.; Rao, G.; Bemis, G. W., ,

@ Generating Novel Inhibitors through Hybridization

of Known Ligands. Application to Cdk2, P38, and Hiv Protease. J. Med. Chem. 2004, 47 (11), 27682775. 37. Fechner, U.; Schneider, G., Flux $*& @ Comparison of Molecular Mutation and Crossover Operators for Ligand-Based De Novo Design. J. Chem. Inf. Model. 2007, 47 (2), 656-667. 38. Fechner, U.; Schneider, G., Flux $ & @ A Virtual Synthesis Scheme for Fragment-Based De Novo Design. J. Chem. Inf. Model. 2006, 46 (2), 699-707. 39. Dey, F.; Caflisch, A., Fragment-Based De Novo Ligand Design by Multiobjective Evolutionary Optimization. J. Chem. Inf. Model. 2008, 48 (3), 679-690. 40. Bohacek, R. S.; McMartin, C., Multiple Highly Diverse Structures Complementary to Enzyme Binding Sites: Results of Extensive Application of a De Novo Design Method Incorporating Combinatorial Growth. J. Am. Chem. Soc. 1994, 116 (13), 5560-5571. 41. Nishibata, Y.; Itai, A., Confirmation of Usefulness of a Structure Construction Program Based on Three-Dimensional Receptor Structure for Rational Lead Generation. J. Med. Chem. 1993, 36, 2921-2928. 42. Wang, R.; Gao, Y.; Lai, L., Ligbuilder: A Multi-Purpose Program for Structure-Based Drug Design. J. Mol. Model. 2000, 6 (7-8), 498-516. 43. Lindert, S.; Durrant, J. D.; McCammon, J. A., Ligmerge: A Fast Algorithm to Generate Models of Novel Potential Ligands from Sets of Known Binders. Chem. Biol. Drug Des. 2012, 80 (3), 358365. 44. Böhm, H.-J., Prediction of Binding Constants of Protein Ligands: A Fast Method for the Prioritization of Hits Obtained from De Novo Design or 3d Database Search Programs. J. Comput.Aided Mol. Des. 1998, 12 (4), 309-323. 45. Böhm, H.-J., The Development of a Simple Empirical Scoring Function to Estimate the Binding Constant for a Protein-Ligand Complex of Known Three-Dimensional Structure. J. Comput.-Aided Mol. Des. 1994, 8 (3), 243-256. 46. Böhm, H.-J., A Novel Computational Tool for Automated Structure-Based Drug Design. J. Mol. Recognit. 1993, 6 (3), 131-137. 47. Böhm, H.-J., Ludi: Rule-Based Automatic Design of New Substituents for Enzyme Inhibitor Leads. J. Comput.-Aided Mol. Des. 1992, 6 (6), 593-606.

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 26

48. Böhm, H.-J., The Computer Program Ludi: A New Method for the De Novo Design of Enzyme Inhibitors. J. Comput.-Aided Mol. Des. 1992, 6 (1), 61-78. 49. Murray, C. W.; Clark, D. E.; Byrne, D. G., Pro_Ligand: An Approach to De Novo Molecular Design. 6. Flexible Fitting in the Design of Peptides. J. Comput.-Aided Mol. Des. 1995, 9 (5), 381395. 50. Clark, D. E.; Murray, C. W., Pro_Ligand: An Approach to De Novo Molecular Design. 5. Tools for the Analysis of Generated Structures. J. Chem. Inf. Comput. Sci. 1995, 35 (5), 914-923. 51. Frenkel, D.; Clark, D. E.; Li, J.; Murray, C. W.; Robson, B.; Waszkowycz, B.; Westhead, D. R., Pro_Ligand: An Approach to De Novo Molecular Design. 4. Application to the Design of Peptides. J. Comput.-Aided Mol. Des. 1995, 9 (3), 213-225. 52. Westhead, D. R.; Clark, D. E.; Frenkel, D.; Li, J.; Murray, C. W.; Robson, B.; Waszkowycz, B., Pro_Ligand: An Approach to De Novo Molecular Design. 3. A Genetic Algorithm for Structure Refinement. J. Comput.-Aided Mol. Des. 1995, 9 (2), 139-148. 53. Waszkowycz, B.; Clark, D. E.; Frenkel, D.; Li, J.; Murray, C. W.; Robson, B.; Westhead, D. R., Pro_Ligand: An Approach to De Novo Molecular Design. 2. Design of Novel Molecules from Molecular Field Analysis (Mfa) Models and Pharmacophores. J. Med. Chem. 1994, 37 (23), 39944002. 54. Clark, D. E.; Frenkel, D.; Levy, S. A.; Li, J.; Murray, C. W.; Robson, B.; Waszkowycz, B.; Westhead, D. R., Pro Ligand: An Approach to De Novo Molecular Design. 1. Application to the Design of Organic Molecules. J. Comput.-Aided Mol. Des. 1995, 9 (1), 13-32. 55. Gillet, V.; Johnson, A. P.; Mata, P.; Sike, S.; Williams, P., Sprout: A Program for Structure Generation. J. Comput.-Aided Mol. Des. 1993, 7 (2), 127-153. 56. Vinkers, H. M.; de Jonge, M. R.; Daeyaert, F. F. D.; Heeres, J.; Koymans, L. M. H.; van Lenthe, J. H.; Lewi, P. J.; Timmerman, H.; Van Aken, K.; Janssen, P. A. J., # @ Synthesize and Optimize System in Silico. J. Med. Chem. 2003, 46 (13), 2765-2773. 57. Schneider, G.; Lee, M.-L.; Stahl, M.; Schneider, P., De Novo Design of Molecular Architectures by Evolutionary Assembly of Drug-Derived Building Blocks. J. Comput.-Aided Mol. Des. 2000, 14 (5), 487-494. 58. Nicolaou, C. A.; Apostolakis, J.; Pattichis, C. S., De Novo Drug Design Using Multiobjective Evolutionary Graphs. J. Chem. Inf. Model. 2009, 49 (2), 295-307. 59. Faulon, J.-L.; Bender, A., Handbook of Cheminformatics Algorithms. Chapman and Hall/CRC: 2010. 60. Treichel, P. M., The Vsepr Model of Molecular Geometry (Gillespie, Ronald J.; Hargittai, Istvan). J. Chem. Educ. 1993, 70 (8), A223. 61. Hsu, H.-H.; Huang, C.-H.; Lin, S.-T., Fully Automated Molecular Design with Atomic Resolution for Desired Thermophysical Properties. Ind. Eng. Chem. Res. 2018, 57 (29), 9683-9692. 62. Allen, F. H. K., O.; Watson, D. G.; Brammer, L.; Orpen, A. G.; Taylor, R, Table of Bond Lengths Determined by X-Ray and Neutron Diffraction Part 1. Bond Lengths in Organic Compounds. J. Chem. Soc., Perkin Trans. 2 1987, (12), S1-S19.

ACS Paragon Plus Environment

24

Page 25 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

63. Orpen, A. G. B., Lee; Allen, Frank H.; Kennard, Olga; Watson, David G.; Taylor, Robin, Supplement. Tables of Bond Lengths Determined by X-Ray and Neutron Diffraction. Part 2. Organometallic Compounds and Co-Ordination Complexes of the D- and F-Block Metals. J. Chem. Soc., Dalton Trans. 1989, (12), S1-S83. 64. O’Boyle, N. M. B., M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R., Open Babel: An Open Chemical Toolbox. J. Cheminf 2011, 3 (1), 33. 65. Grinberg Dana, A.; Liu, M.; Green, W. H., Automated Chemical Resonance Generation and Structure Filtration for Kinetic Modeling. Int. J. Chem. Kinet. 2019, 0 (0). 66. Powell, R. F. M. J. D., A Rapidly Convergent Descent Method for Minimization. Comput. J. 1963, 6 (2), 163-168. 67. Rappe, A. K.; Casewit, C. J.; Colwell, K. S.; Goddard, W. A.; Skiff, W. M., Uff, a Full Periodic Table Force Field for Molecular Mechanics and Molecular Dynamics Simulations. J. Am. Chem. Soc. 1992, 114 (25), 10024-10035. 68. Suidan, L.; Badenhoop, J. K.; Glendening, E. D.; Weinhold, F., Common Textbook and Teaching Misrepresentations of Lewis Structures. J. Chem. Educ. 1995, 72 (7), 583. 69. M. J. Frisch, G. W. T., H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. WilliamsYoung, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. A. Montgomery, Jr., J. E. Peralta, F. Ogliaro, M. Bearpark, J. J. Heyd, E. Brothers, K. N. Kudin, V. N. Staroverov, T. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman, and D. J. Fox Gaussian 09, Revision D.02, Gaussian, Inc.: Wallingford CT, 2016.

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 26

TOC Graphics

ACS Paragon Plus Environment

26