Article pubs.acs.org/crystal
Estimation of Melting Temperature of Molecular Cocrystals Using Artificial Neural Network Model Rama Krishna Gamidi and Åke. C. Rasmuson* Department of Chemical and Environmental Science, Synthesis and Solid State Pharmaceutical Centre, Bernal Institute, University of Limerick, Limerick, Ireland S Supporting Information *
ABSTRACT: A quantitative structure−activity relationship model has been constructed by artificial neural networks for estimation of melting temperature (Tm) of molecular cocrystals (CCs). On the basis of a literature analysis using SciFinder and Cambridge Structural Database softwares, a database has been created of CCs for four active pharmaceutical ingredients, namely, caffeine, theophylline (THP), nicotinamide (NA), and isonicotinamide (INA). In total, of 61 CCs were included: 14-CAF, 9-THP, 29-INA, and 9-NA. A good correlation was obtained with ANNs to quantify the Tm of the CCs with respect to various coformers. The training process was completed with an average relative error of 2.38%, whereas the relative error for the validation set was 2.89%.
■
INTRODUCTION Approximately 40% of all drug molecules on the market suffer from physicochemical property (PCP) problems1 such as low solubility2 and dissolution rate,3 poor thermal stability4 and hygroscopicity,5 tabletability, etc.6 In general, the properties of drugs largely depend upon the molecular arrangement and intermolecular forces within the solid material. Therefore, control over the PCPs of the drugs can be exercised through control over the interactions present within the solid form. Both academia and the pharmaceutical industry are targeting such PCP problems by employing various strategies such as formation of cocrystals (CCs),7 salts,8 hydrates,9 and solvates.10 It has been suggested that formation of CCs11 is the best approach to modify the PCPs without modifying their covalent bonds of the drug molecules. Moreover, the formation of CCs is particularly useful for cases where the drug molecule suffers from PCP issues, due to the presence of nonionizable functional groups. The number of available CC formers listed as generally recognized as safe (GRAS) is very large. Even though there are many studies on CCs, little work has been done for identification of an ideal coformer for improving the desired PCPs quantitatively. Development of methods for such identification would make the search for a suitable coformers to set the PCPs within the target range, in a more effective way. There are a number of approaches in the literature to guide the work of finding coformers for the formation of CCs such as supramolecular synthon approach,12 ΔpKa rule,13 Fabians approach,14 Hansen solubility approach,15 COSMO-RS virtual cocrystal screening through molecular electrostatic potential surfaces and lattice energy calculations.16 However, there are not many studies on prediction of PCPs of molecular CCs with respect to coformer properties. In 2003, Vishweshwar et al.17 © 2016 American Chemical Society
investigated the alteration of the melting temperature (Tm) of cocrystals, by considering five homologous series of (diacid)· (IN)2 CCs as model components. The Tm of CCs is higher for an even number of carbon atoms than for an odd number of carbon atoms of diacid coformers. Even number carbon atoms diacids can pack themselves more densely than the odd number of carbon atoms diacids where small voids are formed. In another example, Aakeröy et al.18,19 found that the Tm and aqueous solubility of the CCs of bis(pyridinecarboxamido)alkane with aliphatic dicarboxylic acid coformers can be predicted in cases where the CC crystal packing arrangement systematically changes within a series of coformers. Thereafter, Báthori et al.20 studied the Tm behavior and partial aqueous solubility, as a function of O···H and C···H intermolecular interaction percentage. By using a FLEXCRYST program suite, Kuleshova et al.21 has estimated the relative stability and the solubility of flavonoid’s CCs with respect to their pure solid forms, etc. Most recently, Perlovich22 developed an approach to estimate thermodynamic properties of the CCs based on a diagram method and derived a correlation equation to relate the Tm of the CCs to various coformers. There are several linear and nonlinear regression methods which are used extensively for the prediction of properties of compounds, from molecular geometric parameters.23−26 Artificial neural network (ANN) approaches have been found to be more robust and are fast growing statistical machine learning methods27 to construct a model for complex or nonlinear systems. Thus, ANN methods are preferable to solve Received: September 23, 2016 Revised: November 18, 2016 Published: November 29, 2016 175
DOI: 10.1021/acs.cgd.6b01403 Cryst. Growth Des. 2017, 17, 175−182
Crystal Growth & Design
Article
a wide variety of tasks that are hard to solve using traditional rule-based programming, as in the field of chemical engineering for correlation of different parameters,28 system control and for the prediction of properties, etc.29,30 Furthermore, ANNs have been used for the prediction of Tm of organic compounds by quantitative structure−property relationship (QSPR) analysis.31−33 Construction of a model for prediction of Tm of a single component is a difficult task because it depends upon various parameters such as the crystal packing arrangement, molecular conformation and strength of inter-/ intramolecular hydrogen bonding (packing motifs) present in the crystal. Construction of a model for prediction of Tm of CCs with respect to coformer structure present in the CCs becomes an even more difficult task. In the present work, a QSAR analysis is undertaken using a robust ANN for development of a better understanding of how the Tm of CCs of the same active pharmaceutical ingredient (API) depends on the properties of the different coformers. The practical advantage with ANNs is that it is a computer modeling approach that does not require a prior knowledge of the actual process (the factors which are involved in the CC formation), but the models are built by considering available knowledge of the components involved, and the building of an activation/transfer function would transfer the input parameters into the output parameter’s of our interest (through a number of iterations to reduce the prediction error). Moreover, ANNs could give us Tm values of the CCs, quantitatively (instead of an equation or graphical representation) which is not possible by earlier proposed prediction methods. Thus, aiming to construct a model for the prediction of Tm of the CCs using ANN methods, we have been selected and created a database of in total 61 CCs Tm of four different APIs (9-THP, 14-CAF, 29-INA, and 9-NA). Lattice energies (Elatt) were calculated for all selected CCs and individual components. Using eight parameters as input neurons, we have succeeded to construct an ANN QSAR model to predict the melting point of the cocrystals.
■
Figure 1. Architecture of constructed ANN models consists of three main layers: input, hidden, and output layers. The input layer is used to introduce the input variables to the network, and output layer represents predictions of the response variables calculated by ANN.
METHODS AND CALCULATIONS
Scheme 1. (a) Schematic Representation of Methodology and (b) Chemical Structures of the Four APIs Included
ANN Modeling: The architecture of the constructed ANN model is composed of an input layer, a hidden layer(s), weights, a sum function, an activation function, and an output layer, as is illustrated in Figure 1. Each connection has some numeric weights that can be tuned based on trial and error, to make them adaptable to inputs and capable of learning. A multilayer feed-forward ANN model was used, where backpropagation of error algorithm (supervised learning) is employed to calculate ANN weights because multilayers of neurons with nonlinear transfer (activation) function allow the network to efficiently learn (non)linear relationships between input and output vectors. To train the model, eight input parameters used are assumed to be those more influential on the Tm of CCs. These parameters are the lattice energy of the CCs (CCElatt), ratio of the electrostatic interaction percentage of the CC vs the van der Waals (vdWs) interaction percentage (CCEel/EvdW) (see Table S1), the total molecular weight of the CCs (CCMW) (see Table S1), the ratio of molecular weight of API vs the molecular weight of the coformer (APIMW/CFMW) (see Table S1), ΔpKa values (ΔpKa values for the 61 CC complexes were calculated (base/acid), where ΔpKa = pKa (base) − pKa (acid) based on pKa values.13 For complexes involving two acids, the pKa of the more basic compound (with more basic substituents) is taken as pKa (base)) of the CC (CCΔpKa), crystal packing density of the CC (CCCPD) (see Table S1), melting temperature of the API (APITm), and melting temperature of the coformer (CFTm). Moreover, to set an ANN QSAR model of 61 CCs with a good generalization capability, the data
points were divided into two sets (i) 55 data points for the training set and (ii) 6 data points for the validation set (one system from each of THP, CAF, and NA (saccharin (SA), 4-fluoro-3-nitro aniline (4F3NAN) and glutaric acid (GTA) respectively), three from INA (adipic acid (ADP), 4-hydroxy benzoic acid (4HBA) and glutaric acid (GTA)). In the training of the model, weight parameters were adjusted iteratively to minimize the criterion function. The attributes which are present in the input/output vectors were normalized between 0 and 1 (within the limits of the sigmoid transfer function, i.e., logsig). The neurons present in the input layer (eight parameters) fed-in through connections with some weights used from −0.5 to +0.5; here, 176
DOI: 10.1021/acs.cgd.6b01403 Cryst. Growth Des. 2017, 17, 175−182
Crystal Growth & Design
Article
Table 1. Sixty-One CCs Used in This Study, Respective CSD Refcodes, Stoichiometric Ratios, ΔpKa and Elatt name of the component
code
caffeine theophylline
CAF THP
isonicotinamide
INA
DL-malic
nicotinamide acid
NA DLMA
D-malic
acid
DMA
glutaric acid
GTA
gentisic acid salicylic acid
GNA SA
p-coumaric acid-I
PCA-I
p-coumaric acid-II saccharin urea glutaric acid
PCA-II SAC URE GTA-I
glutaric acid p-coumaric acid
GTA-II PCA
4-nitroaniline 2-iodo-4-nitroaniline 2-fluoro-5-nitroaniline 4-fluoro-3-nitroaniline 4-chloro-3-nitroaniline 2-chloro-5-nitroaniline 4-iodo-3-nitroaniline 2,4-dinitrobenzoic acid 2-fluoro-5-nitrobenzoic acid salicylic acid
4NAN 2I4NAN 2F5NAN 4F3NAN 4C3NAN 2C5NAN 4I3NAN 24DNBA 2F5NBA SA
salicylic acid_I oxalic acid
SA-I OXA
malonic acid
MLA
succinic acid
SCA
glutaric acid
GTA
adipic acid
ADA
pimelic acid
PIA
suberic acid
SUA
azelaic acid
AZA
fumaric acid
FUA
4-ketopimelic acid
4KPIA
12-bromododecanoic acid salicylic acid
12BDA SA
pKaa 0.7 (cb) 1.7 (cb) 8.77 (ca) 3.61 10.61 3.35 3.40 5.11 3.40 5.11 4.31 5.41 2.97 2.97 13.82 4 9.51 *M 9.51 *M 11.68 0.10 4.31 5.41 5.41 4 9.51 *M 1 0.46 *M 0.52 *M 1.42 *M 1.90 0.40 *M 1.28 *M 1.43 2.69 *M 2.98 13.82 13.82 1.23 4.19 2.83 5.69 4.16 5.61 4.31 5.41 4.43 5.41 4.71 5.58 4.52 5.49 4.550 5.498 3.03 4.44 3.68 *M 4.42 *M 4.95 *M 2.98 13.82
cocrystal
THP/DLMA THP/DMA THP/GTA THP/GNA THP/SA THP/PCA-I THP/PCA-II THP/SAC THP/URE CAF/GTA-I CAF/GTA-II CAF/PCA CAF/4NAN CAF/2I4NAN CAF/2F5NAN CAF/4F3NAN CAF/4C3NAN CAF/2C5NAN CAF/4I3NAN CAF/24DNBA CAF/2F5NBA CAF/SA CAF/SA-I INA/OXA INA/MLA INA/SCA INA/GTA INA/ADA INA/PIA INA/SUA INA/AZA INA/FUA INA/4KPA INA/12BDA INA/SA
177
−1.7 −3.66 −1.7 −3.66 −2.6 −3.36 −1.27 −1.27 −12.12 −2.3 −7.81 −7.81 −9.98 1.60 −3.61 −4.71 −4.71 −2.3 −7.81 −0.3 0.24 0.18 −0.72 −1.2 0.3 −0.58 −0.73 −1.99 −2.28 −13.12 −13.12 2.38 −0.58 0.78 −2.08 −0.55 −2 −0.7 −1.8 −0.82 −1.8 −1.1 −1.97 −0.91 −1.88 −0.94 −1.88 0.58 −0.83 −0.07 −0.81 −1.34 0.63 −10.21
Elatt of CCs (kcal/mol)
ratio
−67.876
1:1
CIZTAH
−67.197
1:1
CODCOO
−60.167
1:1
XEJXIU
−63.764 −55.419
1:1 1:1
DUCROJ KIGLES
−64.651
1:1
IJIBEJ
−63.937 −59.602 −52.728 −59.068
1:1 1:1 1:1 1:1
IJIBEJ01 XOBCUN DUXZAX EXUQUJ01
−59.512 −63.746
1:1 1:1
EXUQUJ IJEZUT
−53.417 −49.364 −52.866 −52.236 −57.170 −53.369 −55.371 −61.841 −57.534 −54.979
1:1 1:1 1:1 1:1 1:1 1:1 1:1 1:1 1:1 1:1
LATGUK LATFUJ LATHEV LATGIY LATGEU LATGOE LATGAQ LATHAR LATHIZ XOBCAT
−54.990 −84.971
1:1 2:1
XOBCAT01 ULAWAF
−126.336
2:1
ULAWEJ
−85.669
2:1
LUNNUD
−56.894
1:1
ULAXAG
−57.713
1:1
ULAXEK
−58.609
1:1
ISIJEA
−62.187
1:1
ISIJIE
−61.805
1:1
ISIJAW
−84.215
2:1
LUNNOX
−91.711
2:1
LUNNIR
−62.859 −50.890
1:1 1:1
LUNMUC XAQQEM
cocrystal Refcode
DOI: 10.1021/acs.cgd.6b01403 Cryst. Growth Des. 2017, 17, 175−182
Crystal Growth & Design
Article
Table 1. continued name of the component
code
3-hydroxybenzoic acid
3HBA
4-hydroxybenzoic acid
4HBA
4-fluorobenzoic acid 3-nitrobenzoic acid 2-hexeneoic acid cinnamic acid
4FBA 3NBA 2HEA CIA
chloroacetic acid (RS)-2-phenylpropionic acid (R)-2-phenylpropionic acid DL-mandelic acid clofibric acid resorcinol
CAA 2PPARS 2PPAR DLMDA CFA REOL
hydroquinone
HQ
3-(N,N-dimethylamino)benzoic acid
3NNDMABA
3,5-bis(trifluoromethyl)benzoic acid meclofenamic acid fumaric acid monoethyl ester fumaric acid
35TFMBA MEFA FAMEE FUA
glutaric acid
GTA
4-hydroxybenzoic acid
4HBAII
ethyl paraben 2-chloro-4-nitrobenzoic acid tolfenamic acid mefenamic acid niflumic acid furosemide
EPB 2C4NBA TOFA MEFA NIFA FURA
pKaa 4.06 9.92 4.48 9.32 4.15 3.47 5.13 3.89 4.44 2.85 4.34 4.34 3.85 3.0 9.32 11.1 9.85 11.4 3.76 4.92 3.81 3.79 3.48 3.03 4.44 4.31 5.41 4.48 9.32 8.34 0.94 3.88 3.79 1.88 4.25
cocrystal INA/3HBA INA/4HBA
*M (cis) (trans)
INA/4FBA INA/3NBA INA/2HEA INA/CIA INA/CAA INA/2PPARS INA/2PPAR INA/DLMDA INA/CFA INA/REOL INA/HQ
*M *M *M *M
INA/ 3NNDMABA INA/35TFMBA INA/MEFA INA/FAMEE NA/FUA NA/GTA NA/4HBAII NA/EPB NA/2C4NBA NA/TOFA NA/MEFA NA/NIFA NA/FURA
Elatt of CCs (kcal/mol)
ratio
cocrystal Refcode
−53.994
1:1
LUNMEM
−55.167
1:1
VAKTOR
−49.167 −54.057 −48.183 −52.787
1:1 1:1 1:1 1:1
ASAXUN01 ASAXOH AJAKAX LUNMAI
−44.004 −48.773 −48.469 −56.255 −54.279 −77.890
1:1 1:1 1:1 1:1 1:1 2:1
LUNNAJ ROLFOO RONDAA LUNPAL UMUYUX VAKTUX
−76.792
2:1
VAKVIN
−51.969
1:1
LUNMIQ
−51.324 −62.583 −51.755 −54.457
1:1 1:1 1:1 1:1
LUNMOW SAXPAK LUNNEN NUKYAU
−57.708
1:1
NUKYEY
−53.731
1:1
RUYHEZ01
−51.664 −54.546 −87.578 −85.931 −62.731 −76.015
1:1 1:1 2:1 2:1 1:1 1:1
GOGQID SUTTUX EXAQIE EXAQOK EXAQEA YASGOQ
−0.45 −6.31 −0.87 −5.71 −0.54 0.14 −1.52 −0.28 −0.83 0.76 −0.73 −0.73 −0.24 0.61 −5.71 −7.49 −6.24 −7.79 −0.15 −1.31 −0.2 −0.18 0.13 0.32 −1.09 −0.96 −2.06 −1.13 −5.97 −4.99 2.41 −0.53 −0.44 1.47 −0.9
a
It is noteworthy to mention that some of the pKa values which we used were extracted from the literature, and the compounds which do not have pKa values were calculated by using the Marvin software (which are marked as *M in the table). These 61 CCs systems cover a wide range of ΔpKa values from −9.98 to +2.41, between the acceptor and donor functional groups. ca-conjugate acid value, cb-conjugate base value.
Table 2. Information about the CSD Analysis (CSD version 5.37, update 1 (Nov 2015) on CAF, THP, NA, and INA name of the category total no. of entries organic other entries polymorphs solvates hydrates salts total no. of anhy. CCs
ratio of binary CCs no. of CCs used for PCP studies no. of CCs used for structural features of CCs combinations (salts, hydrates, CC, solvates and hydrates etc.)
CAF
THP
NA
INA
189 156 33 06 06 02 02 80
114 87 27 07 01 05 03 41 ternary 03 1:2 2:1 01 04 19 19 30
449 180 269 07 02
351 184 167 06 03 03 19 119
binary 80 1:1 1:2 64 02 39 41 60
ternary 2:1 14
binary 38 1:1 33
18 92 binary 85 1:1 66
ternary 06 1:2 2:1 06 12 40 45 62
binary 113 4:1 1:1 01 84
ternary 06 1:2 2:1 05 24 44 69 34
Each neuron in the input layer is connected to all neurons in the hidden layer; thereafter, the neurons present in the hidden layer will transfer the information to output layer property, i.e., Tm of the CCs
the value 0.3 is used as for the learning parameters and 0.5 for the momentum. The total weight of the input layer is nothing but the weighted sum of the inputs from all the eight input parameters. 178
DOI: 10.1021/acs.cgd.6b01403 Cryst. Growth Des. 2017, 17, 175−182
Crystal Growth & Design
Article
through transfer/activation function. The choice of hidden layer(s) and the number of network parameters used here was largely attained by a trial-and-error process. However, we succeeded to get the best results with one hidden layer which contains eight neurons, the same number of neurons as present in the input variables (eight). Kolmogotov theorem states that