A Novel Description of the Crystal Packing of Molecules Elna Pidcock* and W. D. Sam Motherwell The Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, UK Received November 12, 2003;
Revised Manuscript Received March 25, 2004
CRYSTAL GROWTH & DESIGN 2004 VOL. 4, NO. 3 611-620
ABSTRACT: The question of why molecules pack to form particular crystal structures is occupying many researchers throughout the scientific community. Much emphasis has been placed on the study of intermolecular interactions and the search for “structure-directing” motifs. However, examination of experimental crystal structures contained within the Cambridge Structural Database has led us to propose a new, conceptually simple model of crystal packing that describes the arrangements molecules adopt in unit cells. The model was inspired by consideration of arrangements of boxes (with three unequal dimensions) stacked with faces touching and edges aligned. For a fixed number of boxes, there are only a limited number of arrangements possible, and these arrangements, or packing patterns, are of the same volume but different surface area. Applied to crystal structures, the model describes unit cells in terms of multiples (pattern coefficients) of molecular dimensions. The different packing patterns are not populated equally by experimental crystal structures, and it is found that the most populated packing patterns are those that are characterized by low surface area. Correlations between broadly defined molecular shapes and packing patterns have been observed which indicate that molecular aggregation is a useful method for moderating awkward (high surface area for volume) molecular shapes. A limited number of crystal structure prediction trials were performed with the reduced search space afforded by estimated unit cell dimensions (from molecular dimensions), and an increase in success rate was observed. Introduction Understanding how molecular crystal structures are formed, rationalizing the occurrence of the resulting structure or structures, and understanding the stability of these structures are all areas of growing scientific interest. There are many avenues to explore in these fields, and many approaches are taken, from experiments in the control of crystal growth in the laboratory to the ab initio prediction of crystal structures. Successes have been reported in many areas, but overall our understanding of crystal structures remains incomplete. For example, although awareness of polymorphism has increased in recent years, it is not known if all compounds should exist in more than one polymorphic form,1 and the successful prediction of a molecular crystal structure is often the exception rather than the rule.2-4 One approach to broadening the understanding of crystal structures is based on the analysis of data contained within the Cambridge Structural Database (CSD),5,6 a database of ca. 300,000 organic and organometallic crystal structures. Such knowledge mining studies of the CSD have led to, for example, the identification and classification of intermolecular interactions such as hydrogen bonding and the examination of the role that such interactions play in governing the structure of molecular arrays, the development of the idea of molecular synthons as building blocks from which crystal structures are constructed, and the elucidation of relationships between molecular and crystallographic symmetry, to name but a few.7 From a recent analysis of experimental crystal structures in the CSD, a new model of crystal packing was proposed.8 The model is * To whom correspondence should be addressed. The Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, England. Phone: +44 (1223) 762 531. Fax: +44 (1223) 336 033. E-mail:
[email protected] Figure 1. Illustration of packing patterns for four boxes of unequal dimensions l > m > s. Top, three ways of stacking boxes categorized as the 221 packing pattern. Bottom, three ways of stacking boxes in the 114 packing pattern.
based on arrangements of a discrete number of boxes (with three unequal dimensions, l > m > s) stacked with faces touching and edges aligned. In this box model, there are a small number of ways of stacking, for example, four boxes, and these are shown in Figure 1. The dimensions of the container that encloses the array of boxes can be described in terms of multiples of the box dimensions; for example, pattern type 221(l)
10.1021/cg034216z CCC: $27.50 © 2004 American Chemical Society Published on Web 04/10/2004
612
Crystal Growth & Design, Vol. 4, No. 3, 2004
Pidcock and Motherwell
Figure 3. A diagram illustrating the use of the principal axes of inertia (PAI) of a molecule in the determination of the dimensions L and M of the molecule. Figure 2. Crystal structure of (CSD Refcode) ABOROY9 assigned to the 221(S) packing pattern. Top, view down unit cell b axis. Molecules are enclosed in boxes to illustrate the relationship between the molecular dimensions (L > M > S) and cell dimensions. Thus, unit cell axis c ≈ 2L and unit cell axis a ≈ 2M. Bottom, view down c showing that unit cell axis b ≈ 1S. Two molecules are removed from the unit cell for clarity.
gives a container that has dimensions 2m, 2s, and 1l. For each of the arrays of boxes given in Figure 1, the enclosing container has the same volume, but the arrangements differ in terms of surface area of the container. For example, it is clear that of the 221 patterns the pattern type with the least surface area (and most equal dimensions) belongs to 221(l). Similar considerations apply when stacking two boxes or eight boxes: there are a limited number of packing patterns and the arrangements are distinguishable by surface area. These packing patterns represent an efficient way of filling space by a discrete number of objects with three unequal dimensions. Crystal structures can be thought of in similar terms; for the majority of structures, the unit cell is a container of an integer number of molecules. A very large proportion (79%) of the structures contained within the CSD belong to one of five space h (21.6%), P212121 (8.6%), C2/c groups, P21/c (35.5%), P1 (7.7%), and P21 (5.6%), where the number of molecules within these unit cells (assuming a maximum of one molecule in the asymmetric unit, Z′ ) 1) is 1, 2, 4, or 8. The box model above describes packing patterns in terms of the number of boxes within the container, and hence when the model is applied to crystal structures it is not space group specific. The same packing patterns are applicable to all Z ) 4 structures belonging to space groups P21/c, P212121, and C2/c, for example. Thus, according to the box model there are three types of Z ) 2 structure, six types of Z ) 4 structure, and eight types of Z ) 8 structure. It was found from consideration of the ratios of cell axes to molecular dimensions (L > M > S) that experimental crystal structures could be assigned to the above pattern types. It should be noted that we are not proposing that molecules can be described as solid boxes that stack one on top of each other with no interpenetration of layers, but that molecular packing in unit cells has been shown to have similarities with the packing of boxes described above. An example of a 221(S) structure is shown in Figure 2. The dimensions of the unit cell are described approximately by 2L × 2M × 1S.
The previous study did not explicitly take into consideration the orientation of the molecule within the cell. Instead, the best fit to a pattern type was found by taking all permutations of the cell axes with the molecular dimensions. In this paper, we include the orientation of the molecule within the cell in the determination of pattern type. In analogy with the box model, the lengths of unit cell axes are found to be multiples of the molecular dimensions and these multiples, henceforth pattern coefficients, are calculated and presented. The packing patterns found to represent the majority of experimental crystal structures are those that are characterized by low surface area. Correlations between molecular shape and the packing pattern of the crystal structure are observed. We believe this conceptually simple model has introduced a new vocabulary with which to discuss crystal structures and has provided a novel and fundamental framework upon which to build our understanding of crystal structures. Theoretical Calculations Calculation of Pattern Coefficients. Datasets of structures belonging to the above, most common space groups (no alternative settings were allowed) were generated by searching the CSD (Nov 2002 release) using Conquest. Structures comprising molecules of more than one chemical type and structures with more than one molecule in the asymmetric unit were excluded. The number of structures belonging to the space h 4857, group datasets were as follows: P21/c 15882, P1 P212121 8492, C2/c 7782, P21 6638. To probe the relationships between molecular dimensions and cell dimensions, the molecular dimensions and orientation of the molecule within the unit cell were established. To determine the molecular dimensions in a coordinate frame independent of the unit cell, the principal axes of inertia (PAI) of the molecule were utilized. Thus, the three perpendicular principal axes were calculated for each molecule, and the dimensions of the molecule L, M, S were defined as the difference between the maximum and minimum coordinates including van der Waals radii along the principal axes, see Figure 3. Thus, the above datasets were processed by RPLUTO10 to generate a file containing the lengths of the unit cell axes, molecule dimensions (as defined above), and direction cosines describing the orientation of the principal axes of inertia with respect to an orthogonal cell. The most acute angle between each PAI
Novel Description of Crystal Packing of Molecules
and the cell axes (of an orthogonal cell) was determined, and the three values were sorted in ascending order. The smallest angle between a PAI and a cell axis was used to correlate the PAI (and hence a molecular dimension, L, M, or S) with a particular cell axis. This procedure was repeated to determine the relationship between the next PAI and a cell axis. The orientation of the molecule in the cell was determined with respect to two cell axes and the third relationship between the remaining PAI and cell axis was defined by elimination. Thus, each molecular dimension (Dmol) was paired with a particular cell axis (Dcell). The orientation of the molecule within the cell does not have to be known precisely; the desired result is that each molecular dimension is related to the cell axis to which it is most closely aligned. It is acknowledged that the use of an orthogonalized cell may lead to errors in the description of the orientation of the molecule within the unit cell, particularly for cells with oblique angles. However, a survey of 2000 structures belonging to the triclinic space group, P1 h , where Z ) 2, yielded an average cell angle of 92 ( 14°. Therefore, it is likely that for the majority of structures, the above method used to establish the orientation of the molecule in the cell is adequate. Included in the P21/c dataset are structures with very oblique β angles that, if transformed to P21/n, would yield a cell with a β angle closer to 90°. A survey of 10 000 structures assigned to the space group P21/c showed that approximately 5% had a β angle greater that 120°. Thus, the presence of a small percentage of “standardized” P21/n structures does not significantly compromise the integrity of the P21/c dataset. In the box model above, the pattern coefficients (container dimension/box dimension) had integer values. The pattern coefficients (CL,M,S) were calculated for experimental structures with the following equation:
CL,M,S ) Dcell/Dmol where Dmol ) L, M, S. Hence, the orientation of the molecule within the cell was established and pattern coefficients were calculated without ambiguity. The Pattern Coefficients. Distributions of calculated pattern coefficients for Z ) 4 structures belonging to space group P21/c are shown in Figure 4 (top). It can be seen that there are three clear and distinct peaks in the distribution at the values of approximately 0.85, 1.50, and 2.80. A histogram of pattern coefficients calculated for structures belonging to P21 are shown in Figure 4, bottom. Again, distinct peaks are seen at pattern coefficient values of 0.85 and 1.4. It is clear from the histograms that molecular dimensions are related to unit cell dimensions. From the box model above, it was shown that there was one packing pattern family for two boxes, that of the 112 family. If the box model was a viable description of experimental crystal structures, a histogram of calculated pattern coefficients would be expected to show a peak at approximately 1 that was twice as large as a peak at approximately 2 (since for each structure, two cell axis lengths are approximately equal to a molecular dimension and one cell axis length is approximately twice a molecular dimension). This is in close agreement with the histogram of Figure 4 (bottom). However, the most frequently observed pattern coefficients of experimental structures
Crystal Growth & Design, Vol. 4, No. 3, 2004 613
Figure 4. Histograms showing the frequency of occurrence of pattern coefficients calculated for Z ) 4 structures belonging to space group P21/c (top) and Z ) 2 structures belonging to P21 (bottom). The histograms include values calculated for all pattern coefficients, CL, CM, and CS.
do not have integer values. Molecules do not pack with “faces touching and edges aligned”; rather, there is interpenetration of layers of molecules. In addition, the molecules are not aligned exactly parallel with the unit cell axes. Both of these factors result in a reduction of pattern coefficient from the box-model ideals of 1 and 2. For Z ) 4 structures there are two packing pattern families, namely, 221 and 114. Thus, a histogram of pattern coefficients calculated from Z ) 4 structures would be expected to show large peaks at approximately 1 and 2, and a smaller peak at 4. Again, this is in close agreement with the histogram shown, although the most frequently observed pattern coefficients for experimental structures do not have integer values. Assignment of Packing Patterns to Experimental Structures. Two complimentary methods have been used to assign the thousands of experimental structures to a packing pattern and average pattern coefficients for each pattern type have been calculated. Initially, target pattern coefficients were chosen and the Euclidean distance metric was used to establish how well the calculated pattern coefficients for each molecule agreed with the target values. In the box model, the pattern coefficients are 1x, 2x, 4x, where x ) 1. From
614
Crystal Growth & Design, Vol. 4, No. 3, 2004
Figure 5. Topological map of pattern types assigned to Z ) 4 structures belonging to P212121 generated by a Kohonen Network. The labeled blocks of color show the clusters of pattern types. Units are not labeled with a pattern type when the structures of the unit belong to more than one pattern type.
the histograms shown in Figure 4 it appears that a value of x ) 0.8 is more appropriate for experimental structures. Hence, the target pattern coefficients for pattern 221(L) were 1.6, 1.6, and 0.8. For each molecule, the “proximity” (calculated as the RMS difference) of the calculated pattern coefficients to target pattern coefficients for each pattern type of the relevant Z value was calculated and the pattern type that gave the best fit was chosen. Thus, every structure was assigned to a pattern type. Structures assigned to the same packing pattern were collated, and average pattern coefficients were calculated. These data are presented in Supporting Information (Tables S1-S4) for Z ) 2, 4, and 8 structures. The standard deviations of the mean values calculated indicate that a reasonably large spread of values is observed for the experimentally derived pattern coefficients. The second method employed neural networks in the classification of pattern types.11 The Kohonen neural network is able to perform unstructured learning where only input values are used to cluster data.12 Therefore, given just the pattern coefficients for each structure, the Kohonen network is able to cluster structures without the need to assign “target” values. The following procedure for assigning a structure to a pattern type was performed on all datasets. The Kohonen network was trained on a subset of the dataset (for example, 2000 of the 8492 structures in the Z ) 4, P212121 dataset were used in training) using only the three calculated pattern coefficients for each structure. The training was performed in two stages, first with a 100 epochs and learning rate of 0.2 followed by 1000 epochs with a learning rate of 0.05. This regime led to a stable clustering of the structures with no further decrease in the reported error for the dataset. The network grouped the structures based on the proximity of the pattern coefficients to the network’s evolving exemplars; thus, when two input data vectors were similar (i.e., similar pattern coefficients) they were mapped onto the same unit, or units close to one another on a two-dimensional, topological map, Figure 5. When all structures assigned to a unit on the map were found to belong to the same pattern type (as determined by the Euclidean distance method above) the unit was labeled with the pattern
Pidcock and Motherwell
type. When the unit encompassed structures that belonged to two or more pattern types the unit was not labeled. From the topological map, Figure 5, it can be seen that the Kohonen network has grouped like structures with like; structures assigned to a particular pattern type using the Euclidean distance metric are not scattered randomly across the topological map. The clustering of the structures on the map makes intuitive sense: it is understandable that the pattern type 114(S) (∼1L, ∼1M, ∼4S) has boundaries with 221(M) (∼2L, ∼1M, ∼2S) and 221(L) (∼1L, ∼2M, ∼2S) but not 221(S) (∼2L, ∼2M, ∼1S). Every location on the map is populated by some structures, and there are some locations on the map (unlabeled) that represent structures belonging to two different pattern types. The trained network was then run on the remainder of the structures belonging to the dataset. When the trained network was run on unseen data, it generally did not assign a structure to a pattern when the pattern coefficients of the structure placed it in a boundary location on the map. Structures that did not match well with the network’s exemplar for a unit were rejected. In the case of Z ) 4 structures in P212121, 9% of structures were rejected by the Kohonen network. A similar rate of rejection was observed for all the datasets. The structures assigned to a particular pattern type by the Kohonen network were collated and the rejected structures were ignored. Each structure belonging to a pattern type dataset had three pattern coefficients associated with it (CL,M,S) and using Microsoft’s Excel, the mean and standard deviation (σ) for each of the distributions of CL,M,S were calculated. This procedure was repeated for all space group datasets, and the results are given in Tables 1-4. The removal of outlying structures by the network classification has reduced the standard deviations of the pattern coefficients, in some cases, by a significant amount. These pattern coefficients are taken to be a good representation of the datasets and will be used throughout the rest of the paper. Results Regularity of Pattern Coefficients. It can be seen from Tables 1-4 that pattern coefficients calculated for structures belonging to different space groups but the same pattern are similar. The pattern coefficients calculated for experimental structures are not integer as in the box model. The known tendency of molecules to pack with bumps into hollows to form interpenetrating layers will result in a reduction of pattern coefficient from integer values. In addition, the molecules are not aligned parallel to the unit cell axes. Although there is some variation, it appears that the calculated pattern coefficients are consistent across pattern types. Hence, in pattern 221(L), CL has values 0.83-0.88 and in pattern types 114(S) and 114(L) CL has values between 0.79 and 0.88. The consistency of the values of the calculated pattern coefficients supports the notion that crystal structures are close-packed as described by Kitaigorodskii.13 These results point to an underlying principle of molecular aggregation that is dependent primarily on molecular shape. Pattern Coefficients and Molecular Shape. A relationship between molecular shape and pattern
Novel Description of Crystal Packing of Molecules
Crystal Growth & Design, Vol. 4, No. 3, 2004 615
Table 1. Average Pattern Coefficients (CL,M,S) and Standard Deviations (σ) Calculated for Z ) 4 Structures Belonging to the 221 Pattern Category pattern type
space group
CL ( σ
CM ( σ
CS ( σ
structures
221(L)
P21/c P212121 C2/c P21/c P212121 C2/c P21/c P212121 C2/c
0.83 ( 0.13 0.83 ( 0.14 0.88 ( 0.16 1.42 ( 0.22 1.47 ( 0.25 1.42 ( 0.24 1.42 ( 0.23 1.45 ( 0.24 1.44 ( 0.22
1.48 ( 0.26 1.50 ( 0.26 1.44 ( 0.25 0.88 ( 0.13 0.86 ( 0.12 0.86 ( 0.13 1.48 ( 0.27 1.45 ( 0.23 1.51 ( 0.24
1.58 ( 0.29 1.52 ( 0.29 1.55 ( 0.28 1.56 ( 0.28 1.49 ( 0.27 1.61 ( 0.19 0.92 ( 0.13 0.90 ( 0.11 0.89 ( 0.14
5117 2312 1268 2986 1929 1330 2508 1713 956
221(M) 221(S)
Table 2. Average Pattern Coefficients (CL,M,S) and Standard Deviations (σ) Calculated for Z ) 4 Structures Belonging to the 114 Pattern Category pattern type
space group
CL ( σ
CM ( σ
CS ( σ
structures
114(L)
P21/c P212121 C2/c P21/c P212121 C2/c P21/c P212121 C2/c
2.43 ( 0.21 2.50 ( 0.24
0.84 ( 0.11 0.85 ( 0.10
0.92 ( 0.11 0.87 ( 0.10
0.79 ( 0.11 0.79 ( 0.13 0.74 ( 0.15 0.78 ( 0.16 0.77 ( 0.16 0.88 ( 0.22
2.66 ( 0.45 2.69 ( 0.37 2.65 ( 0.37 0.84 ( 0.11 0.89 ( 0.18 0.96 ( 0.30
0.95 ( 0.16 0.92 ( 0.13 0.81 ( 0.13 2.87 ( 0.56 2.90 ( 0.63 2.73 ( 0.49
166 211 0 427 665 9 1222 902 138
114(M) 114(S)
Table 3. Average Pattern Coefficients (CL,M,S) and Standard Deviations (σ) Calculated for Z ) 2 Structures Belonging to the 112 Pattern Category pattern type 112(L) 112(M) 112(S)
space group P1 h P21 P21/c P1 h P21 P21/c P1 h P21 P21/c
CL ( σ
CM ( σ
CS ( σ
structures
1.25 ( 0.17 1.28 ( 0.18 1.29 ( 0.20 0.80 ( 0.12 0.77 ( 0.13 0.72 ( 0.15 0.78 ( 0.13 0.77 ( 0.14 0.72 ( 0.16
0.86 ( 0.10 0.83 ( 0.11 0.81 ( 0.12 1.31 ( 0.21 1.35 ( 0.23 1.45 ( 0.28 0.88 ( 0.13 0.86 ( 0.13 0.84 ( 0.19
0.90 ( 0.10 0.87 ( 0.11 0.90 ( 0.17 0.92 ( 0.12 0.90 ( 0.14 0.95 ( 0.18 1.44 ( 0.29 1.47 ( 0.29 1.66 ( 0.49
569 1113 320 1194 2299 682 2784 2836 1066
Table 4. Average Pattern Coefficients (CL,M,S) and Standard Deviations (σ) Calculated for Z ) 8 Structures Belonging to the 421 and 222 Pattern Categoriesa pattern type
CL ( σ
CM ( σ
CS ( σ
structures
421 142 214 412 241 124 222
2.71 ( 0.30 0.83 ( 0.14 1.55 ( 0.23 2.75 ( 0.34 1.56 ( 0.22 0.84 ( 0.17 1.52 ( 0.31
1.57 ( 0.18 2.75 ( 0.36 0.87 ( 0.12 0.88 ( 0.12 2.82 ( 0.35 1.60 ( 0.30 1.58 ( 0.32
0.91 ( 0.11 1.66 ( 0.30 2.99 ( 0.50 1.59 ( 0.20 0.90 ( 0.13 2.95 ( 0.51 1.60 ( 0.32
255 255 693 184 412 521 890
a
No structures are assigned to the 118 pattern category.
coefficients was found when the volume of the box described by the three perpendicular molecular dimensions (Bvol) was compared with a molecular volume determined by tracing out in three-dimensional space the regions occupied by the molecule (Mvol, calculated using a standard RPLUTO command). For a molecule with an open structure, for example a T-shaped or L-shaped molecule Bvol is much greater than Mvol. In other words, the space encompassed by the three molecular dimensions encloses empty regions within the frame of the molecule. A chart showing the change in pattern coefficients as the ratio of box volume/ molecular volume increases for Z ) 4 structures in P21/c is given in Figure 4. It can be seen that the pattern coefficients decrease as the ratio of Bvol/Mvol increases. Thus, less “box-like”, open-structured molecules fit together efficiently (bumps into hollows), and the cell axes shorten with respect to the expected values of the model
(calculated pattern coefficients decrease). An example of a molecule with a high ratio of box volume to molecular volume (Bvol/Mvol ) 5.5) is shown in Figure 6. It is clear that boxes describing these molecules are heavily overlapped, and thus the calculated pattern coefficients approach the values 1,1,1. For a molecule with a low ratio of box volume to molecular volume (Figure 6), it can be seen that the “molecular boxes” do not have a great deal of overlap, and the structure represents a good agreement with the packing pattern model and hence the a and the b axes are described by ∼0.8(L) and ∼1.6(M), respectively. Classification of Structures. The identification of packing patterns in experimental crystal structures has provided a method for the classification of structures. In the field of polymorphism, the understanding and discussion of polymorphic forms of a compound relies on the ability to be able to differentiate between the forms and express the differences clearly. Although the families of packing patterns do not provide a description of intermolecular interactions, they do provide a fundamental description of a crystal structure to which a more detailed analysis can be added. A cursory look at known polymorphic structures, for example, BIGTUI/ BIGTUI0116 and ATPRCL0117 and ATPRCL1018 (Figure 7) shows that the different polymorphic forms often correspond to different packing patterns. Thus, BIGTIU corresponds to the 114(S) packing pattern (in P21/c) and BIGTIU01 is described by the 221(L) pattern (in P212121). The polymorphic forms of ATPRCL01/10 both crystallize
616
Crystal Growth & Design, Vol. 4, No. 3, 2004
Pidcock and Motherwell
Figure 6. Chart showing the change in CL, CM, and CS with increasing box volume/ molecular volume ratio. Two crystal structures (P21/c) are shown, KEWKAY14 (right) and AZIDCH15 (left), with two molecules removed from each unit cell for clarity. KEWKAY (right) has an open structure that allows bump into hollow type packing, and hence the enclosing boxes are heavily overlapped, Bvol/Mvol ) 5.5. AZIDCH (left) where Bvol/Mvol is 2.95 shows very little overlap of the molecules.
Discussion
Figure 7. Top, two polymorphs, BIGTUI (left) and BIGTUI01 (right) with assigned packing patterns. Bottom, two polymorphs ATPRCL01 (left) and ATPRCL10 (right) with assigned packing patterns.
in P212121, but ATPRCL01 belongs to the 221(M) packing pattern and ATPRCL10 is described by the 114(L) packing pattern. The packing patterns provide a simple yet illuminating description of crystal structures, a description of the orientation of molecules with respect to each other and an idea of the shape of the cell are encapsulated in the packing pattern name.
Surface Area Considerations. To return to the box model, it was noted in the introduction that patterns constructed from the same number of boxes have the same volume but different surface areas. Low surface area containers are achieved by minimizing the number of repeats of the longest box dimension and maximizing the repeats of the shortest box dimension (Figure 1). Thus, of the 221 pattern family, 221(l) has the minimum surface area and 221(s) has the maximum surface area. Of the 114 pattern family, 114(s) has the minimum surface area and 114(l) has the maximum surface area. The four repeats of a box dimension in the 114 pattern family almost guarantees that the container surface area of such a structure will be greater than that of a structure built from a 221 packing pattern. Comparing the 221 patterns with the 114 patterns in terms of surface area shows that 114(s) pattern (the 114 pattern with the best surface area to volume properties) has a lower surface area than 221(l) only when m > 2s. If the box dimensions are similar to each other (approaching a cube) then the distinction between pattern types of a family in terms of surface area is lost. The frequency of occurrence of pattern types is not uniform in experimental crystal structures. For Z ) 4 structures, patterns that belong to the 221 family are more common than structures that belong to the 114 patterns. Within the 221 pattern family, pattern type 221(L) is assigned to 8697/20119 (43%) structures and pattern type 221(S) is assigned to 5177/20119 (26%) structures. For Z ) 2 structures, the pattern type 112(S) is observed most often (52% of dataset) and 112(L) is observed relatively rarely (16% dataset). Structures that are assigned to a member of the 114 pattern family most often belong to the 114(S) pattern type (2262/3740, 60%). In agreement with the above observations, the most populated pattern types for Z ) 8 structures are those with good surface area for volume properties, i.e., 222, 1(L)2(M)4(S) and 2(L)1(M)4(S). It can be seen from the distribution of structures over the pattern types, unit cells that are described by pattern types with the minimum surface area are preferred. This conclusion is in agreement with the results published previously, although the detailed distribution of structures over the
Novel Description of Crystal Packing of Molecules
Crystal Growth & Design, Vol. 4, No. 3, 2004 617
Table 5. Criteria Used in Terms of Molecular Dimension Ratios to Define 3 Molecular Shape Categories shape
criteria for L/M
criteria for L/S
cube disk rod
1.0 e L/M e 1.5 1.0 e L/M e 1.5 L/M > 1.5
1.0 e L/S e 1.5 L/S > 1.5 L/S > 1.5
Table 6. Propensity of Molecular Shape for Packing Pattern and Number of Occurrences (Nobs) of Molecular Shapes Found for Z ) 4 Pattern Types in P21/c propensity
Nobs
pattern type
cube
disk
rod
cube
disk
rod
221(L) 221(M) 221(S) 114(L) 114(M) 114(S) Total
0.96 1.08 1.26 1.25 0.90 0.48
0.99 1.04 0.92 1.03 0.90 1.21
1.11 0.77 0.67 0.50 1.43 1.67
2043 1339 1298 84 158 242 5164
1907 1173 864 65 144 559 4712
1167 474 346 17 125 421 2550
pattern types reported here is different from the previous findings. For these results, the orientation of the molecule in the cell is used to determine the pattern coefficients. Thus, the results presented here are a better representation of the crystallographic data; previously the assignment of a structure to a pattern was performed by finding the best fit between calculated and target pattern coefficients by scanning all permutations of cell dimensions with molecular dimensions. Molecular Shape Considerations. As indicated above, the container with the smallest surface area is not the same for boxes of all shapes. For example, the packing pattern that generates the container with the lowest surface area for a box with m > 2s is the 114(s) packing pattern not the 221(l) packing pattern. Here, relationships between molecular shape and packing pattern have been examined. In this analysis, three broadly defined molecular shapes have been chosen: a cube, a disk, and a rod, defined in terms of the ratios of L/M and L/S (Table 5). The expected number of molecules belonging to each combination of shape category and pattern type (Nexp) for a particular space group was calculated with the following equation:
Nexp )
(
)
Nshape Npattern × × total total total
where Nshape is the number of molecules belonging to a shape category, Npattern is the number of molecules belonging to a pattern type, and total is the total number of molecules belonging to the dataset. The propensity of a shape to be found in a packing pattern was calculated with the following:
propensity ) Nobs/Nexp where Nobs is the number of molecules observed belonging to a shape and packing pattern. A value of propensity less than one indicates there are fewer structures belonging to a shape than expected assuming a random distribution of molecular shapes over pattern types, and a value greater than 1 indicates more structures are found than expected. The results are summarized in Tables 6-8 for Z ) 4 (P21/c), Z ) 2 (P21) and Z ) 8 (C2/c) structures.
Table 7. Propensity of Molecular Shape for Packing Pattern and Number of Occurrences (Nobs) of Molecular Shapes Found for Z ) 2 Pattern Types in P21 propensity
Nobs
pattern type
cube
disk
rod
cube
disk
rod
112(L) 112(M) 112(S) obs
1.40 0.99 0.85
1.06 0.96 1.01
0.69 1.05 1.08
315 462 489 1266
528 991 1277 2798
270 846 1070 2186
Table 8. Propensity of Molecular Shape for Packing Pattern and Number of Occurrences (Nobs) of Molecular Shapes Found for Z ) 8 Pattern Types in C2/c propensity
Nobs
pattern type
cube
disk
rod
cube
disk
rod
4(L)2(M)1(S) 1(L)4(M)2(S) 2(L)1(M)4(S) 4(L)1(M)2(S) 2(L)4(M)1(S) 1(L)2(M)4(S) 222 obs
1.42 1.13 0.79 1.45 0.95 0.88 1.01
0.78 0.88 1.10 0.69 1.05 1.21 0.95
0.52 0.98 1.28 0.63 0.63 0.86 1.08
153 122 231 113 166 193 380 1358
77 86 294 49 167 243 327 1243
25 47 168 22 79 85 183 609
Molecules belonging to each of the three shape categories are found in all the pattern types. Rod and disk shaped molecules are found to a greater extent in packing patterns that describe unit cells with the minimum surface area, for example, 221(L), 114(S), and 112(S). As a corollary, cubic molecules, for which there is less distinction between pattern types, are found to a greater extent than expected in packing patterns whose cell axes are described by multiples of the longest molecular dimension, for example, 221(S), 114(L), 112(L), 4(L)2(M)1(S), and 4(L)1(M)2(S). Hence, for Z ) 4 structures, rod-shaped molecules are more likely to adopt a packing pattern where a cell axis is described by 1L (and not 4L), and disk-shaped molecules prefer packing patterns in which the shortest molecular dimension is multiplied by ∼2 or ∼4. Cubic molecules are present to large extent in patterns in which a cell axis is described by 4L as there is little distinction between L and S. Thus, it appears that there is a mechanism by which an awkwardly shaped molecule with a high surface area for a given volume (for example, a rod or a disk) aggregates to form a group of molecules with much more favorable surface area to volume properties. In other words, extreme molecular shapes can be “neutralized” by the packing pattern. A further point shown by Tables 6-8 is that there is not an even distribution of molecular shapes across space groups. For example, cubic molecules account for 20% of molecules in P21, but 42% of molecules in P21/c and C2/c. In P21/c when Z ) 2 (equivalent to P21) 20% of the molecules belong to the cubic shape category whereas molecules belonging to the Z ) 2, P1 h dataset are described as cubic in 45% of cases (data not shown). Rod-shaped molecules constitute 20% of molecules in P21/c and C2/c but 35% of molecules found in P21. Therefore, in addition to a relationship between molecular shape and packing pattern it appears that there is a relationship between molecular shape and space group, which will be the subject of further investigations. Unit Cell Axes and Packing Patterns. In the calculation of pattern coefficients (see above), for each structure, the orientation of the molecule in the cell was
618
Crystal Growth & Design, Vol. 4, No. 3, 2004
Pidcock and Motherwell Table 9.
CSD Refcode ICEXIX21 LAVBEP22 NOJVIR23 PELVIL24 TRIPHE1225 VEMLUU26 QOYSIG27 NADVUJ28 PIKDUI29 a
cell parameters of lowest energy solutiona
pattern type
12.5 12.79 18.9 19.00 11.4 17.53
12.4 12.75 11.9 12.04 8.8 6.37
9.3 9.55 6.1 6.10 6.7 6.16
221(L) 221(L) 221(S) 221(S) 221(L) 114(M)
18.8 19.10 18.7 16.71 16.1 15.73 16.7 14.06
11.0 10.93 16.0 12.97 12.0 11.96 13.9 11.64
7.0 7.09 3.8 5.28 5.7 5.82 5.2 7.68
114(S) 114(S) 221(S) 114(S) 221(S) 221(S) 221(S) 221(M)
11.9 11.87 14.5 14.61
10.6 10.43 11.8 11.77
6.0 5.80 9.6 9.77
221(S) 221(S) 221(L) 221(L)
no. of times correct solution found at lowest energy, using estimated cell dimensions
no. of trials (out of 4) that gave correct solution using cell axes bounds of 3-30 Å
1
0
3
3
0 solution present in 114(S) run 1
0 2
0
0
1
1
0 solution present in 221(L) run. 5
0
1
0
3
Experimental cell in italics.
established so that each molecular dimension was paired with a particular cell axis. Using this information, correlations between molecular dimensions and unit cell axes were probed directly. It was established earlier that cells that are described with the minimum repeats of the longest molecular dimensions and the maximum number of repeats of the shortest molecular dimensions (i.e., 221(L), 112(S), 114(S), 1(L)2(M)4(S)) are preferred, but it is not known which particular cell axes, if any, are described by these combinations. In space group P21/c, the cell axes are not equivalent; for example, the screw axes are parallel to b and the glide plane is in the c direction. It is observed for structures belonging to P21/c, in all pattern categories, that the most populated patterns are those in which the cell axis c coincides with a repeated molecular dimension (pattern coefficient ∼1.6 or ∼3.2) and cell axis a corresponds to a single instance of a molecular dimension. Few structures are observed in the reverse situation when cell axis a coincides with multiples of a molecular dimension and cell axis c corresponds to a single molecular dimension. In C2/c, for structures belonging to the 124 pattern family it is observed that the single molecular dimension is most often found aligned with the b axis, the axis parallel to the 2-fold rotation axis. In the 222 pattern category in which every cell axis corresponds to multiple molecular dimensions, the least populated combinations are when b is aligned with ∼2L and the most populated are found when b is aligned with ∼2S. In P21 the screw axis runs parallel to the b axis and the a and c cell axes are equivalent and can be arbitrarily assigned. The most populated combinations of cell axis and molecular axis are when the shortest molecular dimension is aligned with b and hence L and M are oriented in the ac plane. The least populated combinations are found when cell axis b coincides with the longest molecular dimension, and therefore M and S are in the ac plane. In space groups P1 h and P212121, the cell axes are equivalent; thus, correlations between cell axes and molecular dimensions
are meaningless. The above correlations between cell axes and molecular dimensions are space group specific: a more detailed analysis to probe relationships between symmetry operators and packing patterns is outside the scope of this paper. However, the above results do indicate that perhaps unsurprisingly, there are some symmetry operators that accommodate repeated molecular dimensions better than others. Estimation of the Lengths of Unit Cell Axes. A simple test to establish the utility of the pattern coefficients was to estimate unit cell dimensions from molecular dimensions of a set of known structures found in the CSD. The molecular dimensions for 2755 Z ) 4 structures in P21/c (that were not included in the dataset used to determine pattern coefficients, above) were determined, and unit cells were calculated by applying the patterns 221(L), 221(M), 221(S), and 114(S). The lengths of the cell axes were judged to have been estimated correctly if the cell dimensions fell within
(CL,M,S ( 2σ)Dmol where Dmol ) L, M, S, where σ is the standard deviation (from Tables 1-4). Of the 2755 structures, the cells of 2548 (92%) were correctly estimated. For a dataset of 1921 Z ) 2 structures belonging to P1 h , using the same criterion for successful prediction as above, and applying the patterns 112(L), 112(M), and 112(S), the cells were correctly estimated for 1806 (94%) structures. Therefore, it appears that the pattern coefficients are very useful for establishing bounds in which the experimental cell resides. Crystal Structure Prediction Trials. The ability to estimate cell dimensions from molecular dimensions has implications for the field of crystal structure prediction. Currently, the search space of a typical trial is approximately 30 × 30 × 30 Åsa volume of 27 000 Å3. A typical, experimentally observed cell has a volume of approximately 2500 Å3. Thus, the reduction in search space to a realistic unit cell will be valuable in terms of
Novel Description of Crystal Packing of Molecules
the reduction in time required to perform a trial as well as the expected increase in success rate. Crystal structure prediction trials have been run using lengths of cell axes estimated from the molecular dimensions using the ( 2σ limit. A dataset of 96 molecules containing C and H only, belonging to P212121, Z ) 4, Z′ ) 1, with one chemical entity in the cell was extracted from the CSD. Nine molecules from this subset were chosen at random and cell dimensions were estimated using the pattern types 221(L), 221(M), 221(S), and 114(S) and the procedure outlined above. RANCEL19 a program that uses a genetic algorithm to search for a minimum in a fitness function calculated from intermolecular atomic distances was used for the crystal structure prediction trials. The packing energy was calculated for all coordination shell molecules within a radius of the longest molecular dimension + 6 Å from the “base” molecule, using empirical potentials from Gavezzotti.20 The input required by RANCEL is a file containing orthogonal coordinates of the molecule (in this case taken from the CSD) and maximum and minimum bounds for cell axes. The conformation of the molecule was fixed, and the positions of the hydrogen atoms were normalized to C-H ) 1.083. For each molecule, a run of RANCEL was performed using calculated cell boundaries for each of the four pattern types listed above. Also, for each molecule, four runs of RANCEL were performed using bounds of 3-30 Å for each cell axis for the purpose of comparison over the same number of genetic algorithm generations. Each starting point population in every run of RANCEL is generated from a different set of random numbers. Results and success rates from the two sets of runs are given in Table 9. Tables of molecular dimensions and cell axis bounds for each structure are included in the Supporting Information, Table S5. As can be seen from Table 9, six out of nine structures were predicted successfully (success rate 67%). In each of the six successful predictions, the correct structure was the lowest energy structure. In two of the three unsuccessful trials, the correct solution was present but was not found at the lowest energy. In all of the above trials, the correct cell axis dimensions were encompassed by the cell boundaries calculated for one or more of the pattern types. These results are encouraging. It appears that given the correct molecular conformation, the correct space group, and a reduced search space, the correct crystal structure corresponds to the lowest energy predicted structure in the majority of cases. The trials performed using bounds on each cell axis of 3-30 Å were also reasonably successful. Of the nine structures, the correct structure was found at the lowest energy in four of them (success rate 44%). In general, the correct solution was not found in every one of the four trials undertaken per structure; rather, the solution appeared once in one, two, or three of the trials. Clearly, there is a great deal more work to be done to establish if packing patterns are useful in predicting the correct Z, the correct space group, or the correct structure. However, preliminary results indicate that the correct solution is likely to be found encompassed by the estimated cell dimensions. It is envisaged that as correlations between molecular shape and packing pattern or packing pattern and cell axes are
Crystal Growth & Design, Vol. 4, No. 3, 2004 619
developed, the search space for a crystal structure prediction trial will be reduced further. Conclusions A conceptually very simple model has been described, and it has been shown that the packing patterns identified provide a sound basis for the description of unit cells and hence crystal structures of the most popular space groups. The model is not space group specific, and there are only a limited number of packing patterns for each of Z ) 2, 4, or 8 structures. Thus, a new method for classifying crystal structures has been introduced: the packing pattern names encapsulate information regarding the spatial orientation of molecules within the unit cell and the shape of the unit cell. For example, the packing pattern 221(L) describes a Z ) 4 structure, and the molecules with dimensions L > M > S are stacked in a 2M × 2S × 1L array. The differences between polymorphic structures are difficult to summarize, but packing pattern classification could provide a starting point for such descriptions. Unit cell dimensions can be described through the combination of packing patterns (with pattern coefficients) and molecular dimensions. The ability to estimate unit cell dimensions from molecular dimensions has applications in a number of areas including crystal structure prediction and X-ray powder pattern indexing. The reduction of search space afforded by estimated cell dimensions has been shown to increase the success rate (by 50%) of a limited number of CSP trials undertaken here, using rigid C,H-only molecules. It is hoped that as correlations between molecular shape and packing pattern and between packing pattern and space group are elucidated, further reductions in search space can be achieved. Although the application of estimated cell dimensions to powder pattern indexing is as yet untested, it offers the possibility of a new approach to what can be a difficult problem. Pattern coefficients that describe the multiples of molecular dimensions are reasonably consistent across space groups and packing patterns. This indicates that there is an underlying description of crystal structures that is based primarily on molecular shape and is somewhat removed from the detailed consideration of intermolecular energetic interactions and space group symmetry operators. It has been observed that packing patterns that generate containers of the lowest surface area are populated to a greater extent than packing patterns which yield containers with a large surface area for a given volume. This result may have implications in the field of polymorphism. Of the two polymorphic structures mentioned above, BIGTIU/01 and ATPRCL01/10, the more stable form of each pair corresponds to the structure associated with the lowest surface area packing pattern. Currently, no conclusions can be drawn but an examination of polymorphic structures with respect to packing pattern, surface area, and the stability of the different forms is underway. The finding that low surface area packing patterns are favored is because although “cubic” molecules are not discriminating in which packing pattern they adopt, molecules with a high surface area for given volume (rods and disks) are more discriminating and tend to favor packing patterns that produce unit cells that are
620
Crystal Growth & Design, Vol. 4, No. 3, 2004
the most cubic. An alternate statement of this principle of “maximum cubicity” is that within the container, the amount of surface area of a box in contact with neighboring boxes is maximized. For example, inspection of Figure 1 shows that in the packing pattern 221(l) the largest faces of the boxes are in contact with each other, unlike in packing pattern 221(s). This result has quite far reaching implications. Not only is it possible to identify likely packing patterns based on a consideration of the shape of a molecule, but these results perhaps hint at a process that occurs during nucleation. Is there a driving force for a molecule with a large surface area for a given volume to aggregate with other molecules in such a way as to reduce the ratio of surface area to volume? Supporting Information Available: Four tables of pattern coefficients calculated using the Euclidean distance method for Z ) 4, 2, and 8 structures, a table of cell axes bounds used in crystal structure prediction trials. This material is available free of charge via the Internet at http:// pubs.acs.org
References (1) Bernstein, J. Polymorphism in Molecular Crystals; Clarendon Press: Oxford, 2002. (2) Motherwell, W. D. S.; Ammon, H. L.; Dunitz, J. D.; Dzyabchenko, A.; Erk, P.; Gavezzotti, A.; Hofmann, D. W. M.; Leusen, F. J. J.; Lommerse, J. P. M.; Mooij, W. T. M.; Price, S. L.; Scheraga, H.; Schweizer, B.; Schmidt, M. U.; van Eijck, B. P.; Verwer, P.; Williams, D. E. Acta Crystallogr., Sect. B: Struct. Sci. 2002, 58, 647-661. (3) Lommerse, J. P. M.; Motherwell, W. D. S.; Ammon, H. L.; Dunitz, J. D.; Gavezzotti, A.; Hofmann, D. W. M.; Leusen, F. J. J.; Mooij, W. T. M.; Price, S. L.; Schweizer, B.; Schmidt, M. U.; van Eijck, B. P.; Verwer, P.; Williams, D. E. Acta Crystallogr., Sect. B: Struct. Sci. 2000, B56, 697-714. (4) Dunitz, J. D. Chem. Commun. 2003, 5, 545-548. (5) Allen, F. H. Acta Crystallogr., Sect. B: Struct. Sci. 2002, 58, 380-388. (6) Bruno, I. J.; Cole, J. C.; Edgington, P. R.; Kessler, M.; Macrae, C. F.; McCabe, P.; Pearson, J.; Taylor, R. Acta Crystallogr., Sect. B: Struct. Sci. 2002, 58, 389-397. (7) Allen, F. H.; Motherwell, W. D. S. Acta Crystallogr., Sect. B: Struct. Sci. 58, 407-422 and references therein.
Pidcock and Motherwell (8) Pidcock, E.; Motherwell, W. D. S. Chem. Commun. 2003, 5, 3028-3029. (9) Magnus. P.; Waring, M. J.; Ollivier, C.; Lynch, V. Tetrahedron Lett. 2001, 42, 4947. (10) Motherwell, W. D. S.; Shields, G. P.; Allen, F. H. Acta Crystallogr., Sect. B: Struct. Sci. 1999, 55, 1044-1056. (11) STATISTICA Neural Networks. StatSoft Inc, Tulsa, OK, USA. 1998. (12) Kohonen, T. Self-Organizing Maps; Springer: Berlin, 1997. (13) Kitaigorodskii, A. I. Organic Chemical Crystallography; Consultants Bureau: New York, 1961. (14) Vojtechovsky, J.; Hasek, J. Acta Crystallogr., Sect. C: Cryst. Struct. Commun. 1990, 46, 1727. (15) Declercq, J. P.; Germain, G.; van Meerssche, M.; L’abbe, G. Bull. Soc. Chim. Belg. 1978, 87, 237. (16) Cameron, T. S.; Labarre, J.-F.; Graffeuil, M. Acta Crystallogr., Sect. B: Struct. Sci. 1982, 38, 2000. (17) Agafonov, V.; Legendre, B.; Rodier, N. Acta Crystallogr., Sect. C: Cryst. Struct. Commun. 1989, 45, 1661. (18) Dideberg, O.; Dupont, L. Acta Crystallogr., Sect. B: Struct. Sci. 1972, 28, 3014. (19) Motherwell, W. D. S. Mol. Cryst. Liq. Cryst. 2001, 356, 559567. (20) Gavezzotti, A. Acc. Chem. Res. 1994, 27, 309-314. (21) Halterman, R. L.; Fahey, D. R.; Bailly, E. F.; Dockter. W. D.; Stenzel, O.; Shipman, J. L.; Khan, M. A.; Dechert, S.; Schumann, H. Organometallics 2000, 19, 5464. (22) Beddoes, R. L.; Gorman, A. A.; McNeeney, S. P. Acta Crystallogr., Sect. C: Cryst. Struct. Commun. 1993, 49, 1811. (23) Haumann, T.; Benet-Buchholz, J.; Boese, R. J. Mol. Struct. 1996, 374, 299. (24) Erker, G.; Aulbach, M.; Kruger, C.; Werner, S. J. Organomet. Chem. 1993, 450, 1. (25) Collings, J. C.; Roscoe, K. P.; Thomas, R. L.; Batsanov, A. S.; Stimson, L. M.; Howard, J. A. K.; Marder, T. B. New J. Chem. 2001, 25, 1410. (26) Schmidbauer, H.; Bublak, W.; Schier, A.; Reber, G.; Muller, G. Chem. Ber. 1988, 121, 1373. (27) Isaji, H.; Yasutake, M.; Takemura, H.; Sako, K.; Tatemitsu, H.; Inazu, T.; Shinmyozu, T. Eur. J. Org. Chem. 2001, 2487. (28) Boese, R.; Haumann, T.; Jemmis, E. D.; Kiran, B.; Kozhushkov, S.; de Meijere, A. Liebigs Ann. 1996, 913. (29) Brooks, P. R.; Bishop, R.; Counter, J. A.; Tiekink, E. R. T. Z. Kristallogr. 1993, 208, 319.
CG034216Z