Cambridge Structural Database Analysis of Molecular

Jan 14, 2009 - The aim of this work is to find such factors by the statistical analysis of data .... in Figure 2a. The large number of data points and...
1 downloads 0 Views 1007KB Size
CRYSTAL GROWTH & DESIGN

Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals La´szlo´ Fa´bia´n* Pfizer Institute for Pharmaceutical Materials Science, Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, U.K., CB2 1EZ

2009 VOL. 9, NO. 3 1436–1443

ReceiVed August 6, 2008; ReVised Manuscript ReceiVed December 10, 2008

ABSTRACT: A set of complete, reliable cocrystal structures was extracted from the Cambridge Structural Database, and molecular descriptors, usually used in quantitative structure-activity relationship studies, were calculated for each molecule. The resulting database describes pairs of molecules that form cocrystals with each other in terms of their calculated molecular properties. Statistical analysis of the data was performed to identify properties that tend to be similar or complementary for such pairs of molecules. The strongest descriptor correlations found relate to the shape and polarity of cocrystal formers. Hydrogen bond donor and acceptor counts of cocrystal formers, on the other hand, show no obvious statistical relationship. Introduction The design of cocrystals has been a field of intensive research in recent years.1,2 With reliable design strategies, cocrystals could offer a modular approach to developing materials with desirable properties.3 Pharmaceutical cocrystals4-7 are of particular interest, since the molecular structure of an active pharmaceutical ingredient (API) is determined by its required biological activity. The unfavorable physical properties of a potential solid drug product thus cannot be tackled by modifying the API molecules, but only by changing formulations. Even though salt formation is the most widely used method to change a drug formulation,8 the lack of suitable acidic or basic groups in the API or problems with the physical properties of the salts (e.g., their tendency to form variable solvates) may preclude the use of salt forms. Cocrystals can provide a viable alternative in such cases, as demonstrated by cocrystals of model APIs with improved dissolution characteristics,5 hydration stabilities,6 or melting points.7 The rational design of cocrystals is usually based on supramolecular synthons.9 If the molecules are able to associate by utilizing different, competing synthons, a design strategy must be concerned with the hierarchy of the synthons, that is, which of the possible synthons are formed at the expense of others. For relatively strong, specific interactions, such as hydrogen bonds and halogen bonds, synthon hierarchies can be established and successfully exploited.2 Nevertheless, the multitude of weaker, nonspecific interactions seriously limits our ability to design cocrystals. Homologous compounds (with the same functional groups and the same possible synthons) often exhibit different reactivity toward cocrystal formation, while some molecules are able to form cocrystals without any obvious synthons connecting them.10 These limitations are usually handled by cocrystal screening,11 a trial-and-error procedure. For practical applications, development costs will depend on the number of screening experiments needed before a suitable cocrystal former is found. It would therefore be important to identify further factors beyond synthon matching that influence the success or failure of screening experiments. The aim of this work is to find such factors by the statistical analysis of data * To whom correspondence should be addressed. E-mail: fabian@ ccdc.cam.ac.uk; tel.: +44 1223 763498; fax: +44 1223 336033.

on cocrystals from the Cambridge Structural Database12 (CSD, version 5.29, November 2007). Experimental Methods Cocrystal Database Creation. The CSD was searched for ordered, error-free organic crystal structures (at least one C atom, only C, H, N, O, S, P, F, Cl, Br, or I atoms allowed). Duplicates and unreliable or incomplete structures were filtered out by using the “best representative” list of van de Streek.13 The remaining structures were exported from the CSD to mol2 files, which were used for further processing and calculations. Sum formulas, formal charges (as stored in the CSD), and InChI identifiers14 were calculated for each residue. Cocrystals were defined as structures containing at least two neutral residues with different InChI identifiers (i.e., structural formulas) that do not appear in a list of common solvents.15 Cocrystals of molecules that occur at least 10 times in the data set were excluded to avoid the possible bias caused by the specific requirements of popular cocrystal formers (Table 1). The resulting database contains 974 cocrystal structures formed by 1949 molecules. Calculation of Molecular Descriptors. The complete set of quantitative structure-activity relationship (QSAR) type descriptors available in our software tools was used to characterize the 1949 molecules, without any prior consideration of their importance in cocrystal formation. Altogether 131 molecular descriptors were calculated for each molecule by using locally written Perl scripts and the programs RPluto,16 JOElib2,17 and Sybyl.18 The 131 descriptors include simple atom, bond and group counts, hydrogen bond donor and acceptor counts, size and shape descriptors, surface area descriptors (with partitioned and charge weighted variants), and molecular electrostatic descriptors (see Table S1, Supporting Information for a complete list). Partial atomic charges for the calculation of electrostatic descriptors were assigned by using the Gasteiger-Huckel method in Sybyl.18 Statistical Analysis. Molecules that were found in the same cocrystal were combined into pairs. Each pair of molecules corresponds to a set of 2 × 131 molecular descriptors. As a first approximation, we analyzed descriptors in pairs, that is, only one descriptor per molecule was considered at a time. (In other words, the analysis was performed in 2 × 1-dimensional projections of the 2 × 131 dimensional parameter space.) If a particular pair of descriptors refers to molecular properties that influence cocrystal formation then the descriptors are expected to assume favorable combinations of values more frequently than unfavorable ones. Consequently, pairs of descriptors that indicate some form of complementarity should be correlated. To find such correlations, correlation coefficients were calculated for all possible pairs of descriptors (131 × 130/2 ) 8515 pairs). The distribution of descriptor values among the molecules is far from a normal distribution, which limits the usability of the most common statistical parameters, such as mean value and standard deviation (Figure 1). Therefore, nonparametric statistical descriptors, which are meaning-

10.1021/cg800861m CCC: $40.75  2009 American Chemical Society Published on Web 01/14/2009

Molecular Complementarity in Cocrystals

Crystal Growth & Design, Vol. 9, No. 3, 2009 1437

Table 1. List of the Most Frequent Cocrystal Formers in the Cocrystal Data Seta b

no. of structures

compound name

no. of structures

compound name

109 85 70 65 60 47 47 46 39 38 37 36 33 31 31 29 28 27 27 27 26 26 25 24 24 24 24 24 22 22 22 20 20 20 20 20 17 16 16 16 15

4,4-bipyridine tetracyano-p-quinodimethane hydroquinone 18-crown-6 urea (E)-4,4′-diazastylbene 2,2′-dihydroxy-1,1′-binaphthyl cholic acid phenazine triphenylphosphine oxide 1,3,5-trinitrobenzene isonicotinamide tetrathiafulvalene fumaric acid succinic acid hexamethylenetetramine iodine p-benzoquinone pyrene 1,2-bis(4-pyridyl)ethane oxalic acid tetracyanoethylene 3R,7R,12R-trihydroxy-5β-cholamide anthracene 4-aminobenzoic acid 3,5-dinitrobenzoic acid pyrazine 1,1,6,6-tetraphenylhexa-2,4-diyne-1,6-diol C-methylcalix4resorcinarene 5,5-diethylbarbituric acid 2-pyridone 2-aminopyrimidine 1,4-diiodotetrafluorobenzene hexafluorobenzene pyromellitic dianhydride 1,2,4,5-tetracyanobenzene 1,1′-bis(4-hydroxyphenyl)cyclohexane caffeine thiourea 2,5-bis(4-pyridyl)-1,3,4-oxadiazole acridine

15 15 15 14 14 14 14 14 14 13 13 13 13 13 13 13 13 12 12 12 12 12 12 12 11 11 11 11 11 11 11 11 11 11 10 10 10 10 10 10

benzene-1,3,5-tricarboxylic acid picric acid resorcinol adipic acid glutaric acid carbamazepine 4-nitrobenzoic acid 4-nitrophenol 1,10-phenanthroline 4,4′-biphenol 2,2′-bipyridine 4,4′-dipyridyl N,N′-dioxide 1,4-diazabicyclo[2.2.2]octane fullerene (C60) naphthalene (R)-(1-naphthyl)glycyl-(R)-phenylglycine tetrachloro-p-benzoquinone cis,cis-1,3,5-cyclohexanetricarboxylic acid octafluoronaphthalene phenol 1,4-phenylenediamine tetrabromomethane theophylline trans-1,5-dichloro-9,10-diethynyl-9,10-dihydroanthracene-9,10-diol benzoic acid cis-anti-cis-dicyclohexano-18-crown-6 trans-1,2-diaminocyclohexane 2,5-piperazinedione tetraiodoethene trans-2,3-bis(1,1-diphenylhydroxymethyl)-1,4-dioxaspiro[4.5]decane trans-4,5-bis(diphenylhydroxymethyl)-2,2-dimethyl-1,3-dioxolane O,O′-dibenzoyl-tartaric acid hexamethylbenzene sebacic acid chloranilic acid 4-hydroxybenzoic acid terephthalic acid perylene 1,3-bis(4-pyridyl)propane 2,3,5,6-tetrafluoro-7,7,8,8-tetracyanoquinodimethane

a Common solvents15 are not included in the list. b The number of structures refers to the subset of complete and ordered organic structures (see Experimental Methods). The total number of known cocrystals with a given compound may be significantly higher than listed here, especially for cavitands and inclusion host compounds.

Figure 1. (a) Histogram of the number of heavy atoms in the molecules of the cocrystal data set. The continuous line represents the normal distribution with the same mean (17.4) and standard deviation (12.7) as the observed distribution. Lower quartile (9), median (14), and upper quartile (21) provide a better description of the observed distribution than mean and standard deviation. (b) The same distribution shown as a box plot. ful irrespective of the shape of the distributions, were also used. Distributions were summarized by median, lower quartile, and upper quartile values, rather than by mean and standard deviation. (Median is the value that “splits” a data set such that 50% of the data values are lower and 50% are higher than the median. Quartiles are defined analogously as values that are higher than 25% (lower quartile) and 75% (upper quartile) of the data set, respectively.) In addition to the

more common Pearson’s correlation coefficient (r, based on mean and standard deviation), Spearman’s nonparametric correlation coefficient (F, based on the ranking of values) was calculated for each molecular descriptor pair. For descriptor pairs with a correlation coefficient of at least 0.25, two-dimensional density plots and box plots were created to present their relationship visually. The usual method of showing the relationship

1438 Crystal Growth & Design, Vol. 9, No. 3, 2009

Fa´bia´n

Figure 2. The number of heavy atoms in pairs of molecules taken from cocrystals shown (a) as a scatter plot and (b) as a density plot. Lighter colors indicate more observed molecule pairs. The contour levels show the value of the two-dimensional probability density function.

Figure 3. The relationship of the fractional polar volumes (FPV) of pairs of molecules that formed cocrystals, shown (a) as a density plot (b) as a box plot. of two variables in a data set is a scatter plot, such as the one shown in Figure 2a. The large number of data points and their overlap, however, makes the interpretation of Figure 2a difficult. It is easier to see the underlying trends if the individual data points are replaced by a function that describes how many data points fall in a specific region of the plot. Density plots (Figure 2b) provide such a representation. The two axes of a density plot are the same as those of the corresponding scatter plot, while the number of observations (data points) in each area of the plot is indicated by color-coding. Lighter colors represent areas with more data points; darker colors represent areas with less data points. Smoothed density plots were generated by applying two-dimensional kernel density estimates.19 The actual color scale of Figure 2b thus refers to the estimated two-dimensional probability density function. The number of data points in any area of Figure 2a can be obtained by integrating the probability density function (Figure 2b) over the given area and multiplying the result by the total number of data points, 1949. Box plots represent the distribution of a variable, so they can be considered as a simplified alternative to histograms (Figure 1b). The top and bottom of the box corresponds to the upper and lower quartiles, the thick horizontal line in the box represents the median, while the whiskers attached to the box stretch to the minimum and maximum of the distribution. Individual outliers are marked as dots. Box plots (e.g., Figure 3b) are used to compare the distribution of a variable for different groups of data, with each group being represented as a box. In the box plots presented here, groups are defined by ranges of a descriptor that describes one molecule (FPV mol 1 in Figure 3b). The individual boxes then show the distribution of another descriptor that refers to the cocrystal-forming partner of the reference molecule (FPV mol 2 in Figure 3b). If the two molecular descriptors are correlated, then a clear trend in the position of the boxes is seen, showing that the distribution of the second variable is shifted gradually as the first variable is changing. The stronger the correlation the less adjacent boxes overlap.

All statistical calculations were performed and statistical figures were created with the R package.20

Results The initial density plots obtained for a variety of descriptors suggested that our data are composed of two subsets that show different behaviors. In particular, a significant negative correlation of F ) -0.26 was found between the number of heavy (i.e., non-hydrogen) atoms in both molecules. This would indicate a preference of small molecules to cocrystallize with large ones. A density plot (Figure 2), however, reveals that this behavior is exhibited by only a small part of the data set. Molecules with ca. 5-25 heavy atoms show little discrimination for the size of their partners, while those larger than 30 heavy atoms cocrystallize predominantly with smaller molecules (less than 10 heavy atoms). The latter group is formed by classic inclusion compounds: crystals of a large host molecule with an awkward shape that cannot pack efficiently and a small guest molecule that fills the voids inside or between the host molecules. Since these molecule pairs showed a distinct behavior and we are interested in cocrystal formation of molecules without major packing frustration, inclusion compounds were excluded from further analysis. The remaining data set contains 710 cocrystal structures, each formed by molecules with 6-30 heavy atoms. The highest correlations for the reduced data set (Table S2, Supporting Information) form three groups, which reveal three qualitative trends. These trends and correlations between hydrogen bond donor/acceptor counts will be discussed in the following sections.

Molecular Complementarity in Cocrystals

Crystal Growth & Design, Vol. 9, No. 3, 2009 1439

Table 2. Correlation Coefficients for Molecular Descriptors Related to Polarity descriptor (p)a dipole r(p1, p2) F(p1,p2)

0.28 0.39

PV

FPV FNO

0.22 0.37 0.30 0.41

PSA

FPSA log P(calcd)

0.30 -0.14 -0.01 0.31 -0.08 0.01

0.08 0.10

a Descriptor definitions: PV: polar volume, the volume of N, O, S atoms, and H atoms bonded to these atoms in the molecule; FPV ) PV/ molecular volume; FNO ) (no. of N atoms + no. of O atoms)/no. of heavy atoms; PSA: polar surface area (defined analogously to PV); FPSA ) PSA/molecular surface area; log P: logarithm of octanol-water partition coefficient, calculated using the method of ref 21.

Molecular Polarity. The strongest correlations found are related to the polarity of the molecules. The positive sign of the correlation coefficients (Table 2) suggests that molecules preferably form cocrystals with partners of similar polarity. Molecular polarity is not a rigorously defined term, so a number of descriptors can be associated with it. It is apparent from Table 2 that these descriptors are not equivalent. The highest correlation found in this analysis relates the fractional polar volumes (FPV) of the cocrystallized molecules. FPV is defined as the fraction of the molecular volume that belongs to polar atoms (N, O, S atoms, and H atoms bonded to N, O, or S). A simpler alternative to using FPV is the descriptor FNO, which is obtained by dividing the total number of N and O atoms by the number of heavy atoms in the molecule. FNO still shows a relatively strong correlation (Table 2), and it can be easily calculated from the molecular formula. The dipole moments of molecules in cocrystals show a similarly strong relationship. (The large difference between the r and F values for dipoles in Table 2 is caused by the strongly skewed distribution of dipole moments: small dipoles are much more frequent in the data set than large ones.) Polar surface area (PSA) and octanol-water partition coefficient (log P) are frequently used in drug discovery to quantify molecular polarity,22 but they seem to have little relevance for cocrystal formation. Correlation coefficients are useful in selecting the interesting descriptors, but they do not show how reliably we can use them for cocrystal design. As illustrated in Figure 3, density plots and box plots are helpful in this regard. The density of observed cocrystals (and that of the corresponding pairs of FPV values) is the highest along the diagonal of Figure 3a, and it decreases gradually with increasing distance from the diagonal. The box plot in Figure 3b compares molecules in four different ranges of their FPV values (mol 1) in terms of the distribution of the FPV values of their cocrystal forming partners (mol 2 axis in graph). The median values (indicated by the horizontal line in the box) show the same trend as the density plot. The degree of overlap between adjacent boxes gives a semiquantitative measure of the significance of this trend. (If the data were normally distributed then the boxes would span the median ( 0.67 σ interval.) Since the whiskers stretch over almost the complete range of FPV values, not even the highest correlation can be regarded as the manifestation of a strict rule. Nevertheless, there is an obvious trend, and it can be judged from Figure 3 whether and how much molecular polarity favors cocrystal formation by two molecules with particular FPV values. Shape and Size. Simple descriptors of molecular shape and size were defined by following the ideas from the box model of crystal packing.23 In this model, the van der Waals volume of the molecule is enclosed in a rectangular box, and the long, medium and short axes of this box are denoted L, M, and S, respectively. While L, M, and S refer to the size of the molecule, their ratios provide information about molecular shape. For example, S/L is small for planar molecules, and M/L is small

for rod-shaped ones. Indeed, these axis ratios show much stronger correlations than axis lengths (Table 3), suggesting that matching of molecular shapes is more important for cocrystal formation than the matching of absolute molecular dimensions. Nonetheless, both the short and the long axes appear to be more influential than other frequently used size descriptors, such as molecular weight. The corresponding density plots (Figure 4) show that S/L and M/L correlations can be interpreted similarly to the FPV correlations: the frequency of observed molecule pairs decreases gradually with increasing difference between the axis ratios. Both descriptors show a marked skewness: S/L is biased toward small and M/L toward big values. An interesting feature of the M/L graph (Figure 4b) is that it is much narrower at low M/L ratios. Qualitatively this means that the more elongated a molecule, the less likely that it forms a cocrystal with a molecule of different shape. The S/M density plot (Figure 4c) is complicated by the peculiar distribution of S/M values. The molecules in the database form two distinct groups, with S/M ratios around 0.5 and 0.75, respectively. Members of both groups form cocrystals more frequently with molecules from the same group than with molecules from the other group. The wide lobes between the two maxima in Figure 4c, however, indicate that there are several examples showing a marked deviation from the overall trend. The correlations found for the short (S) and long (L) axis dimensions are related to the shape correlations. The strong S/L shape correlation means that molecules of a flat shape tend to form cocrystals with other flat molecules. With approximately half of the molecules in the data set being planar (S < 5 Å), the shape correlation directly translates to similar S values (i.e., to cocrystals of planar molecules with planar molecules). The correlation between the long axis values is very weak, but the shape of the density distribution (Figure 4d) suggests a stronger correlation for larger L values. This is confirmed by the correlation coefficients calculated separately for structures with L1 < 14 Å and for those with L1 > 14 Å: r(L1, L2) ) 0.08 for the former and 0.14 for the latter subset. The molecules with L > 14 Å typically exhibit an M/L ratio of ca. 0.5, so the correlation of the long axes is explained by the stronger tendency of elongated molecules to cocrystallize with partners of similar shape. Globularity is a shape descriptor that relates the surface area of a molecule to its volume. Globularity is small for molecules with a smooth surface, while bumps and hollows of the molecular shape increase its value. The correlation seen for this descriptor (Table 3) is linked to the packing frustration that could arise in a cocrystal formed by a bumpy and a smooth molecule. This shape relationship appears to be stronger for smooth molecules (i.e., those with lower values of globularity), which are predominantly planar molecules (see Supporting Information for figures). Negative Molecular Surface and Hydrogen Bond Donors. The strongest relationships between different descriptors for two cocrystal-forming molecules link the negative surface area of a molecule to the number of hydrogen bond donors in the other (Figure 5, Table S2, Supporting Information). Donor H atoms have a positive partial charge, so they increase the positive surface area of molecules. Consequently, a positiVe correlation between donor group counts and negative surface area descriptors would be expected. Surprisingly, the sign of the actual correlation is negatiVe: F(ASAN1, Dplu2) ) -0.32. (ASAN is the accessible surface area of atoms with negative partial charge, computed using a probe

1440 Crystal Growth & Design, Vol. 9, No. 3, 2009

Fa´bia´n

Table 3. Correlation Coefficients for Molecular Descriptors Related to the Shape and Size of the Molecules descriptor (p)a

L

M

S

S/L

S/M

M/L

mol weight

volume

globularity

r(p1, p2) F(p1,p2)

0.17 0.16

0.04 0.03

0.19 0.22

0.38 0.40

0.38 0.38

0.41 0.38

-0.02 -0.05

-0.09 -0.04

0.25 0.21

a Descriptor definitions: L, M, S: long, medium and short axis of an enclosing box; globularity: molecular surface area divided by the surface of a sphere with the same volume as the molecule.

Figure 4. Density plots showing the shape relationship of molecules in cocrystals. S, M, and L are the short, medium, and long axes of a rectangular box that encloses the molecule.

Figure 5. Accessible surface area of atoms with negative partial charge (ASAN) in molecules that form a cocrystal with partners having 0, 1, 2, 3, or more hydrogen bond donors.

radius of 1.5 Å, while Dplu is the total number of donor H atoms.) The box plot in Figure 5 reveals that the negative correlation is due mainly to a specific group of cocrystals, formed by a molecule with no donors and by another with a large negative surface area. A manual survey of the corresponding cocrystals resolved the apparent contradiction: most of them are charge transfer complexes, which are formed by an aromatic hydrocarbon or a tetrathiafulvalene analogue (no hydrogen bond donors) and by another planar molecule with

delocalized electrons that is made π-electron deficient by several electron withdrawing substituents (large negative surface area). Hydrogen Bond Donors and Acceptors. The success of cocrystal design by utilizing hydrogen-bonded supramolecular synthons clearly shows the importance of hydrogen bonds in forming cocrystals. One may thus expect that donor/acceptor counts in our data set should reflect the distinguished role of such interactions. Figure 6, however, shows that neither absolute donor/acceptor counts nor their differences show the expected trend. (The number of donors is defined as the number of polar H atoms, while the number of acceptors is given by the number of possible acceptor heteroatoms, that is, by the number of O and N atoms except for N atoms with more than three bonds.) The results remain the same even if the charge transfer complexes and cocrystals with a stoichiometry other than 1:1 are removed from the data set and/or if simple donor/acceptor counts are replaced by the average number of donor and acceptor hydrogen bonds a functional group forms in the CSD (Figure S6, Supporting Information).24 The contradiction between these results and the known importance of hydrogen bonds is only an apparent one. What these results show is that counting donors and acceptors is insufficient to describe their complementarity. The formation of synthons is governed by the strength of hydrogen bonds between cocrystal formers rather than by the number of available

Molecular Complementarity in Cocrystals

Crystal Growth & Design, Vol. 9, No. 3, 2009 1441

Figure 6. The relationship of hydrogen bond donor/acceptor counts in molecule pairs from cocrystals: (a) donors in one molecule vs. acceptors in the other molecule (b) difference between the number of acceptors and donors in both molecules. The number of donors is given as the number of donor H atoms, while each available acceptor atom is counted as one acceptor.

Figure 7. Cocrystal structures from the CSD. (a) 2,6-Bis(((6-methylpyrid-2-yl)amino)carbonyl)-naphthalene 1,12-dodecanedicarboxylic acid, JOHPUR,27 (b) 3-(2,6-dimethylphenyl)pyrimido(4,5-b)-1,8-naphthyridine-2,4(1H,3H)-dione N-n-butyl-N′-(4-methylpyridin-2-yl)urea, IXUDIO,28 (c) bis(1,2,5)-thiadiazolotetracyanoquinodimethanide m-divinylbenzene, HEJHOT,29 (d) n-heptadecanoic acid nicotinamide, FIFLAI.30 Atom colors: red - oxygen, blue - nitrogen, yellow - sulfur, gray - carbon, white - hydrogen.

groups. Modeling of hydrogen bond donor/acceptor strengths, however, requires more sophisticated calculations25,26 than those applied in the current analysis. We plan to extend this work in the near future by using the logit hydrogen bond propensity model26 to identify the “best” (i.e., the most likely) homo- and heteromolecular hydrogen bond(s) that can be formed by the donors and acceptors of the molecules in a cocrystal. Discussion The above statistical analysis showed that the majority of cocrystals in the CSD are formed by molecules of similar polarities and shapes, but deviations from these overall trends have also been observed. Analysis of representative structures that either follow or do not follow these trends may thus help their qualitative chemical interpretation. Most cocrystals are obtained by solution crystallization, so the preference for similar molecular polarities could be a

consequence of the comparable solubilities of the cocrystal formers in the crystallization solvent. The most important polarity descriptors in the QSPR prediction of solubility values are often log P and various surface area descriptors.22b The lack of correlation in these descriptors (Table 2) thus suggests that solubility is not the only reason behind the preference for similar polarities. Figure 7 shows examples of cocrystals27-30 formed by molecules with both similar (Figure 7a,b) and dissimilar (Figure 7c,d) polarities in terms of the FPV and FNO descriptors. (For example, FNO ) 4 O atoms/18 heavy atoms ) 0.22 for the acid, and FNO ) 2 O atoms + 4 N atoms/30 heavy atoms ) 0.2 for the amide in Figure 7a.) In three of these four cocrystals, the molecules are arranged such that distinct hydrophobic and hydrophilic slabs can be identified. (The hydrophobic region is in the middle of Figure 7a,b, and on the left-hand side of Figure 7d.) Polar and apolar regions are often segregated in crystals,

1442 Crystal Growth & Design, Vol. 9, No. 3, 2009

and the topology of the separate regions (layers, rods, spheres) depends on the polar/apolar volume ratio.31 Consequently, molecules with dissimilar FPV values are expected to favor different topologies, which, in turn, may make the formation of a cocrystal with the preferred separation of hydrophobic and hydrophilic regions more difficult. Exceptions to the similar polarity rule can be expected when specific favorable interactions link polar and apolar groups to each other or when the orientation of functional groups is such that it allows effective segregation without matching overall molecular polarities. The first case is illustrated by HEJHOT (Figure 7c), where charge transfer interaction generates stacks of alternating polar and apolar molecules. The second type of exception is demonstrated by FIFLAI (Figure 7d). Although the nicotinamide molecules are polar and the margaric acid molecules are mostly apolar, the carboxyl group is attached to the end of the long hydrophobic chain, so the two molecules can hydrogen bond without the occurrence of unfavorable hydrophobic-hydrophilic contacts. Three of the example cocrystals (Figure 7a-c) are formed by molecules of similar shapes. The margaric acid nicotinamide cocrystal (FIFLAI, Figure 7d) shows an example of cocrystal formation by molecules of different shapes. Close packing of this cocrystal is possible because of the specific mutual orientation of the hydrogen-bonded molecules. The nicotinamide molecule has a similar width and depth to that of the alkyl chain, so its attachment to the end of the chain in this specific orientation does not prevent close packing of the alkyl chains. If the shapes of the molecules are more similar (Figure 7a-c) then close packing puts less restriction on their relative orientation. The molecules of Figure 7a, for example, could form a close packed array with any relative shift between the long axes of the acid and amide molecules (horizontal in the figure), so the arrangement that optimizes hydrogen bonding is easily realized. The charge transfer complex HEJHOT (Figure 7c) illustrates why such complexes generate a negative correlation between negative surface area and hydrogen bond donors (Figure 5). The electron-rich hydrocarbon molecule (with no hydrogen-bond donors) interacts with an electron acceptor that has several N atoms in the electron withdrawing substituents (contributing to its negative surface area). Similarly to the overwhelming majority of cocrystals, none of the molecules in the example structures of Figure 7 has more hydrogen bond donors than acceptors. Consequently, the abundance of acceptors in one molecule of a cocrystal cannot be compensated for by the abundance of donors in the other. The apparent hydrogen-bonded heterosynthons in Figure 7a,b,d illustrate that such reliable interactions will occur irrespective of any donor/acceptor “imbalance” in the molecules. Conclusion Statistical analysis of known cocrystals in the CSD has led to the identification of molecular properties that influence cocrystal formation. The shapes and polarities of molecules that form cocrystals tend to be similar, while there is no indication for complementarity with regard to the numerical “imbalance” of hydrogen bond donors and acceptors. Inclusion compounds and charge transfer complexes have been revealed as distinct subsets of cocrystals. The observed relationships may provide useful qualitative guidelines for the rational design of cocrystals and, by using the simple molecular descriptors presented here, may form the basis of a semiquantitative predictive model.

Fa´bia´n

Understanding the relationship of these ideas with specific supramolecular synthons is an important aspect of their practical utility. Initial experimental results suggest that obtaining a cocrystal is likely if both the molecular descriptors discussed here and the available supramolecular heterosynthons favor its formation.32 Further experiments to elucidate this relationship and the development of a predictive computational model using molecular descriptors and hydrogen bond propensities are in progress. Acknowledgment. The author is grateful to David Palmer (University of Cambridge) for his help with QSAR descriptors, to Samuel Motherwell (CCDC) for advising on shape descriptors and on using RPluto, and to Frank Allen (CCDC) for his comments on the manuscript. William Jones (University of Cambridge), Neil Feeder (Pfizer), and Pete Marshall (Pfizer) are acknowledged for useful discussions. The financial support of Pfizer Inc. is gratefully acknowledged. Supporting Information Available: Complete list of molecular descriptors used, descriptor pairs with the highest correlations, additional density and box plots.This material is available free of charge via the Internet at http://pubs.acs.org.

References (1) (a) Bhogala, B. R.; Basavoju, S.; Nangia, A. CrystEngComm 2005, 7, 551–562. (b) Du, M.; Zhang, Z.-H.; Zhao, X.-J.; Cai, H. Cryst Growth Des. 2006, 6, 114–121. (c) Childs, S. L.; Hardcastle, K. I. CrystEngComm 2007, 9, 364–367. (d) Thalladi, V. R.; Dabros, M.; Gehrke, A.; Weiss, H.-C.; Boese, R. Cryst. Growth Des. 2007, 7, 598– 599. (2) (a) Saha, B. K.; Nangia, A.; Jasko´lski, M. CrystEngComm 2005, 7, 355–358. (b) Aakero¨y, C. B.; Desper, J.; Helfrich, B. A.; Metrangolo, P.; Pilati, T.; Resnati, G.; Stevenazzi, A. Chem. Commun. 2007, 4236– 4238. (c) Bouchmella, K.; Boury, B.; Dutremez, S. G.; van der Lee, A. Chem. Eur. J. 2007, 13, 6130–6138. (d) Aakero¨y, C. B.; Hussain, I.; Forbesa, S.; Desper, J. CrystEngComm 2007, 9, 46–54. (3) (a) Frisˇcˇic´, T.; MacGillivary, L. R. Croat. Chem. Acta 2006, 79, 327– 333. (b) Horiuchi, S.; Kumaia, R.; Tokura, Y. Chem. Commun. 2007, 2321–2329. (c) Maspoch, D.; Domingo, N.; Roques, N.; Wurst, K.; Tejada, J.; Rovira, C.; Ruiz-Molina, D.; Veciana, J. Chem. Eur. J. 2007, 13, 8153–8163. ¨ .; Zaworotko, M. J. Chem. Commun. 2004, 1889– (4) (a) Almarsson, O 1896. (b) Vishweshwar, P.; McMahon, J. A.; Bis, J. A.; Zaworotko, M. J. J. Pharm. Sci. 2006, 95, 499–516. (c) Reddy, L. S.; Babu, N. J.; Nangia, A. Chem. Commun. 2006, 1369–1371. (5) (a) Remenar, J. F.; Morissette, S. L.; Peterson, M. L.; Moulton, B.; ¨ . J. Am. Chem. Soc. MacPhee, J. M.; Guzm ´ an, H. R.; Almarsson, O 2003, 125, 8456–8457. (b) Childs, S. L.; Chyall, L. J.; Dunlap, J. T.; Smolenskaya, V. N.; Stahly, B. C.; Stahly, G. P. J. Am. Chem. Soc. 2004, 126, 13335–13342. (c) Li, Z. J.; Abramov, Y.; Bordner, J.; Leonard, J.; Medek, A.; Trask, A. V. J. Am. Chem. Soc. 2006, 128, 8199–8210. (6) (a) Trask, A. V.; Motherwell, W. D. S.; Jones, W. Cryst. Growth Des. 2005, 5, 1013–1021. (b) Trask, A. V.; Motherwell, W. D. S.; Jones, W. Int. J. Pharm. 2006, 320, 114–123. (c) Frisˇcˇic´, T.; Fa´bia´n, L.; Burley, J. C.; Reid, D. G.; Duer, M. J.; Jones, W. Chem. Commun. 2008, 1644–1646. (7) (a) Walsh, R. D. B.; Bradner, M. W.; Fleischman, S.; Morales, L. A.; Moulton, B.; Rodrı´guez-Hornedo, N.; Zaworotko, M. J. Chem. Commun. 2003, 186–187. (b) Fleischman, S. G.; Kuduva, S. S.; McMahon, J. A.; Moulton, B.; Walsh, R. D. B.; Rodrı´guez-Hornedo, N.; Zaworotko, M. J. Cryst. Growth. Des. 2003, 3, 909–919. (8) (a) Haleblian, J. K. J. Pharm. Sci. 1975, 64, 1269–1288. (b) Stahl, P. H.; Wermuth, C. G., Eds. Handbook of Pharmaceutical Salts: Properties, Selection and Use; Wiley-VCH/VHCA: Weinheim/Zurich, 2002. (9) Desiraju, G. R. Angew. Chem., Int. Ed. Engl. 1995, 34, 2311–2327. (10) Siegler, M. A.; Fu, Y.; Simpson, G. H.; King, D. P.; Parkin, S.; Brock, C. P. Acta Crystallogr., Sect. B 2007, 63, 912–925. ¨ .; Peterson, M. L.; Remenar, J. F.; (11) (a) Morissette, S. L.; Almarsson, O Read, M. J.; Lemmo, A. V.; Ellis, S.; Cima, M. J.; Gardner, C. R. AdV. Drug DeliVery ReV. 2004, 56, 275–300. (b) Stahly, G. P. Cryst. Growth Des. 2007, 7, 1007–1026.

Molecular Complementarity in Cocrystals (12) Allen, F. H. Acta Crystallogr., Sect. B 2002, 58, 380–388. (13) van de Streek, J. Acta Crystallogr., Sect. B 2006, 62, 567–579. (14) (a) http://www.iupac.org/inchi/(b) Stein, S. E.; Heller, S. R.; Tchekhovski, D. An Open Standard for Chemical Structure Representation: The IUPAC Chemical Identifier, In Proceedings of the 2003 International Chemical Information Conference (Nimes); pp 131-143. (15) (a) Go¨rbitz, C. H.; Hersleth, H.-P. Acta Crystallogr., Sect. B 2000, 56, 526–534. (b) Nangia, A.; Desiraju, G. R. Chem. Commun. 1999, 605–606. (16) RPluto: http://www.ccdc.cam.ac.uk/free_services/rpluto. (17) JOElib2 - a Java based computational chemistry package, http:// joelib.sourceforge.net. (18) Sybyl 7.0; Tripos Inc.: St. Louis, MO, USA. (19) Venables, W. N.; Ripley, B. D. Modern Applied Statistics; Springer, New York, 2002. (20) (a) R Development Core Team; R: A Language and EnVironment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2006. (b) http://www.r-project.org/ (21) Wildman, S. A.; Crippen, G. M. J. Chem. Inf. Comput. Sci. 1999, 39, 868–873.

Crystal Growth & Design, Vol. 9, No. 3, 2009 1443 (22) (a) Manly, C. J.; Louise-May, S.; Hammer, J. D. Drug DiscoV. Today 2001, 6, 1101–1110. (b) Hughes, L. D.; Palmer, D. S.; Nigsch, F.; Mitchell, J. B. O. J. Chem. Inf. Model. 2008, 48, 220–232. (23) Pidcock, E.; Motherwell, W. D. S. Chem. Commun. 2003, 3028–3029. (24) Infantes, L.; Motherwell, W. D. S. Chem. Commun. 2004, 1166–1167. (25) Hunter, C. A. Angew.Chem. Int. Ed. 2004, 43, 5310–5324. (26) Galek, P. T. A.; Fa´bia´n, L.; Motherwell, W. D. S.; Allen, F. H.; Feeder, N. Acta Crystallogr., Sect. B 2007, 63, 768–782. (27) Garcia-Tellado, F.; Geib, S. J.; Goswami, S.; Hamilton, A. D. J. Am. Chem. Soc. 1991, 113, 9265–9269. (28) Quinn, J. R.; Zimmerman, S. C. Org. Lett. 2004, 6, 1649–1652. (29) Suzuki, T.; Fukushima, T.; Yamashita, Y.; Miyashi, T. J. Am. Chem. Soc. 1994, 116, 2793–2803. (30) Amai, M.; Kamijo, M.; Nagase, H.; Endo, T.; Ueda, H. Anal. Sci.: X-Ray Struct. Anal. Online 2005, 21, x9. (31) Ward, M. D.; Horner, M. J. CrystEngComm 2004, 6, 401–407. (32) Frisˇcˇic´, T. private communication.

CG800861M