The Supramolecular Synthon Approach to Crystal Structure Prediction

ABSTRACT: A new approach has been proposed for the ab initio crystal structure prediction of small organic ... DESIGN. 2002. VOL.2,NO.2. 93-100. 10.10...
0 downloads 0 Views 170KB Size
CRYSTAL GROWTH & DESIGN 2002 VOL. 2, NO. 2 93-100

Articles The Supramolecular Synthon Approach to Crystal Structure Prediction J. A. R. P. Sarma*,† and Gautam R. Desiraju*,‡ gvk bioSciences Pvt. Ltd., #210, ‘My Home Tycoon’, 6-3-1192, Begumpet, Hyderabad 500 016, India, and School of Chemistry, University of Hyderabad, Hyderabad 500 046, India Received December 21, 2001;

Revised Manuscript Received January 11, 2002

W This paper contains enhanced objects available on the Internet at http://pubs.acs.org/crystal. ABSTRACT: A new approach has been proposed for the ab initio crystal structure prediction of small organic molecules. This exercise forms a part of the recent blind test on crystal structure prediction conducted by the Cambridge Crystallographic Data Centre. The method uses as a starting point lists of low energy structures generated by an exhaustive computational procedure, namely, the Polymorph Predictor program in Cerius2. Such computational procedures take into account only the enthalpic factors in crystallization. A further difficulty is that information relating to crystallization kinetics is very hard to obtain directly. However, such kinetic information is implicitly contained in the experimental structures that are found in crystallographic databases. Therefore, in our approach, the low energy structures obtained in the Polymorph Predictor program are reranked after consideration of experimental structures of structurally similar molecules. Operationally, this is most conveniently carried out after identification of possible supramolecular synthons in the Cambridge Structural Database. These synthons are representative structural units that convey critical information that relates isolated molecules with their resulting crystal structures. Of the three molecules in the blind test, the present approach was fully successful for one, but only of limited utility in the two others. Reasons for this variability of success are given. Introduction A core question in crystal engineering is, given the molecular structure of an organic substance what is its crystal structure?1 The purpose of this question seems to be obvious, but a general answer is not available at the present time.2 The difficulties in this regard stem from two counts. The first relates to the complementary nature of the recognition phenomenon.3,4 This complementarity is characteristic of both geometrical and chemical recognition and renders a functional group or modular approach to organic crystal structure prediction (CSP) to be essentially futile.5 For instance, carboxylic acids form dimers, catemers, solvates, or hydrates depending on other functional groups present in the molecule. A crystal is truly a supramolecular entity, and the molecules that constitute it interact implicitly among themselves during crystallization. The smallest of changes, small that is in molecular terms, can cause profound changes in the crystal structure that is adopted. The second difficulty in predicting crystal structure lies † ‡

gvk bioSciences Pvt. Ltd. University of Hyderabad.

in the fact that although the vast majority of organic structures tend toward close packing, there are subtle deviations from exact close packing in various regions of the crystal. The reason for these deviations is the anisotropic and long-range nature of many common heteroatom interactions, most notably, hydrogen bonding.6 In theory, a large number of nearly close packed structures, almost similar in energy would seem to be possible. In practice, one or a small number of structures are realized experimentally. The weak interactions exert an influence during crystallization that is out of all proportion to their final contribution to the crystal stabilization energy. Weakness of interaction does not generally lead to lack of specificity in recognition. Both these problems are more or less pervasive in all non-hydrocarbon crystals. For hydrocarbons, intermolecular interactions are exclusively of the van der Waals type and crystal structures may be predicted computationally, or in special cases, even by inspection.7 In the more numerous heteroatom-containing structures, on the other hand, the interactions are a complex mosaic of varying strengths, directionalities, and distancedependent characteristics.8 Since all presently used

10.1021/cg015576u CCC: $22.00 © 2002 American Chemical Society Published on Web 02/13/2002

94

Crystal Growth & Design, Vol. 2, No. 2, 2002

computational methods for CSP rely on atom potentials,9 the following problems are unavoidable: (i) lack of transferability of isotropic potentials; (ii) our stillevolving knowledge of anisotropic potentials; (iii) lack of reliable prescriptions for the charges that need to be used in any given situation; (iv) limiting the interactions as occurring between two bodies rather than treating them as multibody phenomena. In general, all this follows from the fact that an extremely complex chemical event, namely, crystallization, is being treated by what is essentially an empirical method. Even if these difficult issues could be tackled, there still remains a formidable problem. Crystallization is a supramolecular reaction and, as such, it is governed by both thermodynamic and kinetic factors. In other words, experimental structures are not only stable but they should also be attainable. However, methods for CSP based wholly on the calibration and use of accurate atom potentials take into account only the thermodynamic (enthalpic) factors. Matters concerning kinetics and entropy are not really taken into account.10 The present paper is an attempt to include these issues within CSP and relies on the fact that all this chemical information pertaining to the crystallization event is implicit in the body of crystal structure data currently available. The natural source of this information is the Cambridge Structural Database (CSD).11 With the CSD, one is able to examine the recognition information in molecular crystals, more specifically, the arrangements and frequencies of occurrence of the supramolecular synthons,12 which are the key structural elements relating molecular structure to crystal structure. More specifically then, this paper suggests that the question “molecule f crystal” be expanded to the question “molecule f synthon f crystal”. With such an expansion, the problem of CSP gets simplified.

The reader should note here that by CSP, what is meant is the prediction of an unknown crystal structure in a complete fashion. This includes unit cell parameters to within a few percent tolerance, space group, and all other crystallographic details as are obtained after structure solution with a single crystal X-ray diffractometer. In other words, CSP in the present context does not refer just to the anticipation of broad structural patterns, motifs, and so on, as is now common practice in experimental crystal engineering. This distinction between CSP and mainline crystal engineering has been clearly stated by Moulton and Zaworotko in a recent review.13 They argue persuasively that while CSP is more precise, crystal engineering is less restrictive from a conceptual perspective. In any case, the aims of CSP are far more stringent and its successful accomplishment is expected to be correspondingly more difficult. CSP or ab initio crystal structure prediction may truly be likened to the Mount Everest of crystal engineering

Sarma and Desiraju

and supramolecular synthesis. Noting that the problem is of nearly hopeless difficulty and with no general solution being yet available, blind tests toward this end have been conducted by the Cambridge Crystallographic Data Centre (CCDC) in 1999 and 2001.14 We describe now our efforts with molecules I, II, and III, which constitute the CSP2001 test. Results General Considerations. Some details of the blind test are pertinent. The crystal structures of molecules I-III were determined previously, and the results were held in confidence by an independent referee. To avoid undue complexity, the number of non-hydrogen atoms in the molecule was kept below 20. For similar reasons, the crystal structures were restricted to the 10 most common space groups, to have only a single molecule in the asymmetric unit, and to be ordered and unsolvated. Around 18 participants were invited to submit up to three predictions for each molecular structure in order of likelihood, giving reasons for their ranking. The molecular formulas were supplied in November 2000, and the deadline for submission of the results was March 2001. Our approach toward CSP proceeded in two stages. The first stage is purely computational and relies on the Polymorph Predictor (PP) software of Accelrys to generate a large number of plausible crystal structures. This Monte Carlo based method of generating, clustering, minimizing, and reclustering crystal structures is well described elsewhere.15,16 For the purpose of CSP2001, we formed a loose collaboration with Dr. Frank Leusen (L) of Accelrys, Cambridge, and Dr. Paul Verwer (V) of University of Utrecht, who independently participated in the test. We obtained lists of the top 1000 crystal structures from them for each of the three molecules in their respective PP outputs (L and V) sorted on the basis of maximum density and minimum energy. Whereas L and V submitted the three lowest energy structures obtained by them as their three solutions in that order, we diverged at this point and used crystal structure information obtained from the CSD in the second stage. We proceeded on the following assumptions: (1) the actual structure of the test molecule or nearly such a structure will be found somewhere in the lists of structures generated by the PP program; (2) the ranking of a particular structure in the PP may need to be modified with the increasing importance of kinetic factors; (3) there are critical features in the molecular structure that lead to robust supramolecular synthons during crystallization, and if such molecular features are correctly identified, prediction of crystal packing becomes easier. Our method consisted of the following steps, which are described in detail below. (1) Selection of a set of molecules from the CSD that are considered to be the most similar to the test molecule. (2) Identification of supramolecular synthons in these selected molecules. These synthons form the core of the new structure that is generated for the test molecule. (3) Unbuilding of an experimental structure to a single molecule in the unit cell, and removal of those molecular fragments not contained in the test molecule, followed by minimization

Supramolecular Synthon Approach to Crystal Structure Prediction

Crystal Growth & Design, Vol. 2, No. 2, 2002 95

Scheme 1. Twelve Molecules, Dissimilar to I, that Were Excluded from Further Consideration, Thereby Reducing the Number of “Similar” Molecules from 22 to 10

Figure 1. Fragment and the CSD query used in the CSP of molecule I.

into the test molecule. (4) Rebuilding and minimization of the crystal structure of the test molecule from the unbuilt structure. (5) Comparison of this derived structure with those generated by the PP outputs of L and V. (6) Choice of the solution. Of these steps, 3, 4, and 5 are carried out computationally with the Cerius2 program of Accelrys. Steps 1, 2, and 6 require critical manual inputs and constitute the heart of the supramolecular synthon approach to CSP, which we now illustrate with molecule I, a simple bicyclic imide. Step 1. Selection of Similar Molecules. This step is, in practice, the most crucial aspect of the whole exercise because the ultimate solution is strongly biased by this operation. The initial search fragment and the search query are given in Figure 1 while the refcodes of the obtained hits are given in the Supporting Information. The query generated 88 hits of which 44 were discarded because they contain aromatic and heterocyclic residues that were expected to influence the crystal packing strongly. Another 22 were removed because they are bis-imides (10), or contain other strong hydrogen bond donors and/or acceptors (12). These refcodes are given in the Supporting Information. Of the 22 structures remaining, refcodes BEWWIJ, BXIQCN, CHEXIM, COHTAU, NISFIE, RERYES, REWWOF, SUCCIN, SUCCIN01, SUCCIN02, TEKQAB, and WINXIA (Scheme 1) were further removed because they have one or more molecular attributes that were expected to lead to features in the crystal packing grossly different from those anticipated for the test molecule. For example, the overall shapes of BEWWIJ, CHEXIM, and REWWOF are very different from molecule I. Again, COHTAU, NISFIE, and RERYES do not contain the critical methine H-atoms, in the C(R) position. These H-atoms were considered important for two reasons: (1)

Scheme 2. The Ten CSD Structures Deemed to Be the Most Similar to Molecule Ia

a Crystal structures of molecule I derived from the crystal structures of these molecules are compared with the PP generated lists of Leusen and Verwer, and the corresponding ranking is given along with the type of hydrogen-bonded synthon in each case.

their propensity for forming C-H...O interactions because of their activation; (2) their steric effect on the N-H...O hydrogen bond pattern formed. YITBUY and YUFNES, which lack C(R) H-atoms were, however,

96

Crystal Growth & Design, Vol. 2, No. 2, 2002

Sarma and Desiraju

Figure 2. Differences in molecular area (Å2) and volume (Å3) between molecule I and the ten CSD molecules closest to it in shape and size.

Figure 3. Dimer and catemer synthons formed by the imide fragment in Figure 1.

retained at this stage because their shape and size are close to molecule I. Succinimide (SUCCIN, SUCCIN01, SUCCIN02) and maleimide (TEKQAB) are more interesting. These molecules were considered to be too simple and too symmetrical to be proper models for the test molecule. Their packing is of the layer type, and one cannot really expect that it would be reproduced in the crystal structure of the nonplanar molecule I. The 10 refcodes that finally remained were considered seriously in terms of the known crystal chemistry of the various functionalities present (Scheme 2). Figure 2 is a plot of molecular area versus molecular volume (both calculated with Cerius2) for these compounds. The reader will note that within a group of chemically related molecules, such an area/volume plot represents an index of molecular similarity. It will be seen that four compounds AZMCHO, DTHPIM, PHYPHM, and YUFNES compare favorably with molecule I. Step 2. Identification of Supramolecular Synthons. Hydrogen bonding was expected to be an important factor in the crystal structure of molecule I, and therefore the search query was based on the imide fragment (Figure 1). This fragment can lead to two possibilities for the hydrogen bond arrangement. These are the N-H...O dimer and catemer synthons (Figure 3). At this stage, we were prepared to consider both possibilities, but it was noted that of the four closest matches mentioned above, three are catemeric and only

one is a dimer. Further, of the 10 matches for the molecules in Scheme 2, seven are catemers and three are dimers. Therefore, there seems to be a statistical preference for catemeric synthons. Steps 3, 4, and 5. Unbuilding, Rebuilding, Minimization, and Comparison. These steps proceeded according to the Crystal Builder module in Cerius2. Initially, the experimental structure in the CSD was unbuilt, and the molecule was changed to molecule I. The potentials and charges assigned to molecule I were in accordance with the final list of PP structures to be compared, that is, corresponding to either PP-CVFF (L) or to PP-Dreiding/Charges (V). The molecule was minimized, and the crystal was rebuilt. Further crystal minimization was carried out until convergence. Comparison of the rebuilt structure with those in the PP lists followed the PP protocols. This comparison is based on crystal packing and not on unit cell parameters. Deviation between the two crystal packings is expressed in terms of the RMS deviation factor. Scheme 2 gives the rankings for each of the 10 final structures in the L and V listings. Significantly, an initial experimental catemer structure in the CSD remained a catemer throughout the unbuilding, minimization, rebuilding, and minimization process even though no constraints were applied during these operations. Similarly, an initial dimer structure remained as a dimer. Step 6. Choice of Solution. With respect to the Leusen outputs, MOPRDB (L 1), DTHPIM (L 3 and L 5), PHYPHM (L 8), YITBUY (L 20), and VOBDEV (L 22) led to structures of the test molecule I that appeared toward the top of the list. With respect to the Verwer outputs, only structures derived from MOPRDB (V 2) and the isomeric MOPRDA (V 14 and V 15) gave high ranks. That none of the 10 compounds led to structures with a ranking worse than 547 in either PP set indicated to us that both force fields are reasonably accurate. Let us now consider the catemer-dimer question. It should be noted that of all the high ranks, only MO-

Supramolecular Synthon Approach to Crystal Structure Prediction

Crystal Growth & Design, Vol. 2, No. 2, 2002 97

Figure 4. Superposition of the calculated and experimental crystal structures for imide I (stereoview). W 3D rotatable images of the W predicted and W experimental structures in pdb format are available. Table 1. Some Characteristics of Computationally Derived Structures for Test Molecule Ia structure of I derived from refcode AZMCHO DTHPIM molecule I EACLEZ GLUTIM MOPRDA MOPRDB PHYPHM VOBDEV YITBUY YUFNES

space group

cell parameters

density (g cm-1)

PE per molecule (kcal/mol)

Pccn Pbcac P21/c P21/c P21/a P212121 P212121 P21/c P21/c P21/c P21/c P21/c P21/c P212121 P212121 P21/c P21/c P212121 P212121 P21/n P21/n

9.22, 21.61, 8.11 6.04, 22.19, 12.75 9.18, 10.51, 8.02, 96.98 9.18, 10.51, 8.02, 96.98 7.71, 10.61, 9.34, 95.03 8.19, 10.26, 9.27 8.19, 10.26, 9.27 10.61, 7.99, 9.46, 92.56 10.61, 7.99, 9.46, 92.56 12.06, 11.08, 6.57, 111.83 11.50, 11.08, 6.57, 78.54 10.89, 7.40, 10.12, 97.59 10.47, 7.60, 9.96, 105.16 5.92, 10.20, 12.86 5.92, 10.20, 12.86 9.61, 7.11, 11.61, 84.22 9.61, 7.11, 11.61, 84.22 6.15, 15.64, 8.07 6.15, 15.64, 8.07 9.57, 9.11, 9.37, 103.12 9.57, 9.11, 9.37, 103.12

1.26 1.19 1.32 1.32 1.34 1.31 1.31 1.27 1.27 1.24 1.24 1.26 1.33 1.31 1.31 1.29 1.29 1.31 1.31 1.28 1.28

-15.62 -16.42 -19.43 -19.43 -19.45d -18.63 -18.63 -18.02 -18.02 -17.40 -17.40 -17.70 -19.65 -19.22 -19.22 -18.80 -18.80 -18.82 -18.82 -18.48 -18.48

RMSb factor 0.23 0.03

0.03 0.03 0.17 0.21 0.03 0.03 0.03 0.03

a

For each CSD refcode, there are two lines of numericals. The first line corresponds to the structure derived from the experimental structure in the CSD, unbuilding, rebuilding, and minimization. The second line corresponds to the equivalent structure as obtained in the PP of Leusen’s list. b RMS factor indicates the extent of deviation in crystal packing between the CSD and Leusen’s PP generated structures. c The space group is changed to Pbca because Pccn does not occur in the list of the most common space groups, which are a prerequisite for I-III according to the blind test protocols. d The PE is obtained after minimizing the experimental structure without varying the cell parameters.

PRDB led to a dimer. However, the shape and size matching of MOPRDB with molecule I is poor as may be seen in Figure 2. We settled for a catemer structure for imide I based on the following: (1) the molecule is reasonably small and the hydrogen bond functionalities are free of steric hindrance from neighboring groups; (2) the molecule is aliphatic; and (3) the molecule does not contain large and/or floppy substituents. Our choice is further confirmed by an inspection of Figure 2. DTHPIM and PHYPHM are similar to the test molecule in area and volume; both lead to high ranks and both are catemers. YUFNES, which is similar in area and volume, leads to a low ranked structure, but it is a dimer. The geometrically dissimilar MOPRDB leads to a dimer structure but now of high rank. These latter observations on structures derived from YUFNES and MOPRDB when taken together indicated to us that the dimer option is a poor one. A conjunction of a high rank and molecular similarity appeared to be the most critical criterion, and our selection of the catemer structure derived from DTHPIM (L 3) as our first choice is consistent with such a criterion. Our second choice was

the C-centered version of the same (L 5), and the third choice was the dimeric structure derived from MOPRDB, selected only because it had a rank of (L 1) and (V 2). We were gratified to note that our solution was one of the few correct solutions for molecule I, and indeed the only solution that was assigned the first of the three ranks allowed by the rules of the CSP. A stereoview of the crystal structure is given in Figure 4. The cell parameters of the predicted and actual (in parentheses) structures are P21/c, a ) 8.02 (7.71), b ) 10.51 (10.61), c ) 9.18 (9.34), and β ) 96.98 (95.03). Density and packing energies values (Table 1) also indicate that the structure generated from DTHPIM is very close to the experimental structure. The next best structure is the one derived from PHYPHM. Although the structure based on MOPRDB is closely related to (L 1), the RMS factor is rather high, indicating that the similarity between the CSD and PP structures is actually quite limited. All these factors had been considered suitably while making the final selection. Molecules II and III. Molecule II is poorly function-

98

Crystal Growth & Design, Vol. 2, No. 2, 2002

Scheme 3. Structural Variations in Molecule III: (a) Tautomeric Equilibrium, (b) Conformational Equilibrium, (c) Experimentally Observed Conformation

Sarma and Desiraju Scheme 4. Two Major Synthons Observed in PP Simulations of Molecule IIIa

a

alized, and even identification of a robust supramolecular synthon proved to be difficult. Our CSD query was based on the five-membered heterocyclic ring, but we were well aware that this particular molecular fragment might not be important as a crystal structure director. It is merely exotic in a molecular sense. As mentioned above, the initial selection of similar molecules is so critical to our method that when this step is ambiguous, the rest of the exercise may become unproductive. Our experiences with molecule III were even more daunting. We (and others who participated in the blind test) noted that there is a possibility of both geometrical isomerism and conformational variability in this molecule (Scheme 3). There are also some tautomeric considerations. Upon inquiry, the independent referee informed us that the structure consists of molecules in the quinonoid form. Further, we noted a similarity between molecule III and those studied by Bar and Bernstein,17 and the four conformations selected by us correspond to those in this earlier study. The supramolecular synthons proposed for this structure are given in Scheme 4. However, and in hindsight, the experimental structure of molecule III has a unique molecular conformation (Scheme 3c and Figure 5), where the S-phenyl group eclipses the S-N bond, a conformation hitherto unobserved among the many related compounds in the CSD. Therefore, our efforts with this molecule were bound to be unsuccessful. Still, it was noted that the experimental structure contains synthon B given in Scheme 4 which was recognized based on our CSD study of related compounds. Significantly, synthon B is observed in our second and third choices of predicted structure. Synthon A, which was found in our first choice, did not occur in the experimental structure. The energy difference between the experimentally observed conformation and the commonly observed conformation is nearly 3.0 kcal/ mol, with latter being more stable. The experimental

The experimental structure contains synthon B.

Figure 5. Histogram of the C-C-S-N torsion angle (indicated bold in the above search fragment) in CSD molecules related to III. This angle is nearly 100° in many cases but in crystalline III it is around 0°.

structure of III is thus very unusual, and it is our view that CSP of this molecule is too difficult an exercise for the present. Discussion It should be emphasized that the dominant contribution to the crystal energy arises from the enthalpic terms, and that these are adequately treated with the current atom potential methods.9 If the potentials are sufficiently accurate, there is a high probability that the experimental structure will be found at least among the top 1000 structures generated by a program such as PP. So, any attempt at CSP whether it is of the traditional type or of the knowledge-based type proposed in this paper depends critically on having the “best” possible potentials. For example, in CSP of molecule I, it was found that the CVFF force field (L) generally gave much lower ranks than Dreiding potentials with high level

Supramolecular Synthon Approach to Crystal Structure Prediction

ab initio charges (V). This seemed to us to follow from the fact that the CVFF, which is a class I force field, has been derived from amino acids and protein structures wherein N-H...O hydrogen bonding is dominant. A rule-based potential such as Dreiding does not have this advantage. Although the quality of atom potentials has vastly improved in recent years, it is still acknowledged that these potentials are unable to predict experimental crystal structures routinely.14 The global minimum in a computational approach reflects thermodynamic factors, which to a first approximation may be taken as the enthalpic contribution. However, crystallization is in itself entropically disadvantageous. Why does it even begin? During the events preceding nucleation, there should be some kind of a balance between the formation of interactions (enthalpy) and the dynamic nature of the molecular cluster (entropy). This balance depends on the structure of the extended molecular aggregate that is formed. Those aggregates that can be sustained further into the crystallization event are the ones more likely to survive and cross over into the nucleation regime. The “best” aggregates contain the most robust of supramolecular synthons, that is, the ones that would be found most often in real structures. All these phenomena are reflective of the kinetic factors. Experimental crystal structures are formed after molecules survive through all these stages and so in considering them, one is implicitly taking into account the kinetic factors. The fact that each of the 10 structures that were considered to be the most similar to molecule I, led to reasonably well-ranked structures in the PP also suggests a physical meaning for the many hypothetical structures generated by the PP. Most of these probably have an existence in solution in the prenucleation stages, and the worse the ranking the less likely such structures are able to survive. This fact is further evidenced when the packing energy and density values are observed in Table 1. Seven out of 10 structures have energies that are less than 1 kcal/mol/molecule within the experimental structure and with favorable densities.18 It is a matter for consideration that their nuclei could not propagate as they do not have an optimal enthalpyentropy balance. Indeed, the problem of CSP is one of reranking rather than ranking. Changes in the ranking, such as taking structure (L 3) and assigning to it rank #1 implies that the energy difference between (L 3) and (L 1) is the minimum difference in free energy arising from the kinetic/entropic factors. In structural terms, this reranking is tantamount to choosing between a dimer and a catemer structure. The dimer/catemer dilemma has been encountered previously with acetic acid.10 High level computations have invariably shown that the dimer structure is the more stable and yet, acetic acid has only one experimentally observed structuresa catemer. In qualitative terms, the dimer is formed easily but cannot propogate easily. Similar considerations also apply to molecule I. It is well-known that hydrogenbonded catemers form in aliphatic compounds and that too only when steric factors are favorable. The small rigid structure of imide I is easily disposed to such a structure. In YUFNES and RERYES, the presence of flanking methyl groups in the bridgehead positions

Crystal Growth & Design, Vol. 2, No. 2, 2002 99

proves to be too severe a steric constraint, and the dimer structure follows. Of course, such rationalizations could hardly have been used a priori for CSP; however, they corroborate in a chemical way the procedures outlined above. Similarities between crystallization and protein folding have been noted previously.19 There are obvious features in common in predicting the crystal structure of a small organic molecule and in predicting the tertiary structure of a protein. Both crystallization and folding involve delicate balances between attractions and repulsions at the atomic level, between enthalpic and entropic contributions to the free energy, and between thermodynamic and kinetic factors. Both processes could proceed through kinetically stabilized states, which could be semicompact random globules for proteins20 and supramolecular synthons12 for small molecules. These intermediate states could result in a considerable increase in crystallization or folding efficiency and in the correction of mistakes. They are revealing of the kinetic factors involved. Use of pattern information in predicting protein folding in terms of structurally conserved regions is the main feature of homology modeling.21 Elements of this modular strategy have been used in our synthon based approach to CSP. Conclusions The approach proposed here for CSP is a conjunction of computational methods with chemical information. The better the information, the more reliable will be the CSP. In this regard, we note that if the CSD were to be significantly larger than what it is today, say, around a million refcodes, CSP with the synthon-based approach could be successfully employed for a much wider variety of molecules. One could also consider whether there is any need for further fine-tuning of atom potentials, leaving aside the question as to whether it is possible or even meaningful. A number of PP generated structures using genetic algorithm techniques to model both conformational preferences and crystal structure packing could well be useful as a starting point for difficult cases, such as molecule III. In other words, it may not be realistic to consider these events independently. In summary, and from the viewpoint of CSP, the utilization of structural information could provide a more effective sieve toward the correct solution. As the amount of structural information in crystallographic databases increases, structure prediction would gradually move toward fingerprinting. Acknowledgment. We would like to express our sincere thanks to Dr. Frank Leusen and Dr. Paul Verwer for making available their lists of computed crystal structures. We are grateful to Dr. Sam Motherwell of CCDC for conducting the blind test and inviting us to participate. One of us (G.R.D.) would like to thank the Department of Science and Technology, Government of India, for financial support, and Accelrys for its continuing cooperation and J.A.R.P.S. would like to thank Mr. Sanjay Reddy, CEO, gvk bioSciences for his support. Supporting Information Available: Lists of CSD refcodes of structures related to molecule I, beginning with the 88 structures that were generated by the query in Scheme 1

100

Crystal Growth & Design, Vol. 2, No. 2, 2002

and those removed stepwise until the final list of 10 refcodes was obtained. This material is available free of charge via the Internet at http://publs.acs.org.

References (1) Schwiebert, K. E.; Chin, D. N.; MacDonald, J. C.; Whitesides, G. M. J. Am. Chem. Soc. 1996, 118, 4018-4029. (2) Gavezzotti, A. Acc. Chem. Res. 1994, 27, 309-314. (3) Pauling. L.; Delbru¨ck. M. Science 1940, 92, 77-79. (4) Dunitz, J. D. Crystals as supermolecules. In Desiraju. G. R., Ed.; Perspectives in Supramolecular Chemistry, Vol. 2. The Crystal as a Supramolecular Entity, Wiley: Chichester, 1995; pp 1-30. (5) Desiraju, G. R. Current Challenges in Crystal Engineering. In Howard, J. A. K.; Allen, F. H.; Shields, G. P., Eds; Implications of Molecular and Materials Structure for New Technologies; Kluwer: Dordrecht, 1999; pp 321-339; Desiraju. G. R. Crystal Engineering. In Braga. D. et al., Eds.; From Molecules and Crystals to Materials; Kluwer: Dordrecht, 1999; 229-241. (6) Desiraju. G. R. Crystal Engineering: The Design of Organic Solids, Elsevier: New York, 1989. (7) Desiraju G. R.; Gavezzotti. A. J. Chem. Soc., Chem. Commun. 1989, 621-623; Bernstein, J.; Sarma, J. A. R. P.; Gavezzotti, A. Chem. Phys. Lett. 1990, 174, 361-368. (8) Price, S. L.; Beyer, T. Progress and Problems in the Computer Prediction of Molecular Crystal Structures and Polymorphism. In Rogers, R. D.; Zaworotko, M. J., Eds.; Proceedings of the Symposium on Crystal Engineering: ACA Transactions, 1998, 33, 23-31. (9) Williams, D. E. Acta Crystallogr. 1996, A52, 326-328; van Eijck, B. P.; Kroon, J. Acta Crystallogr. 2000, B56, 535542; Schmidt, M. U.; Englert, U. J. Chem. Soc., Dalton Trans. 1996, 2077-2082; Ammon, H. L.; Du, Z.; Holden, J. R. J. Comput. Chem. 1993, 14, 422-437; Apostolakis, J.; Hofmann, D. W. M.; Lengauer, T. Acta Crystallogr. 2001, A57, 442-450; Mooij, W. T. M.; van Eijck, B. P.; Kroon, J. J. Phys. Chem. 1999, A103, 9883-9890; Hofmann, D.; Lengauer, T. Acta Crystallogr. 1997, A53, 225-235. (10) Perhaps the difficulties arising in modeling the experimental crystal structures of acetic acid are owing to such problems. See Mooij, W. T. M.; van Eijck, B. P.; Price, S. L.; Verwer, P.; Kroon, J. J. Comput. Chem. 1998, 19, 459-474.

Sarma and Desiraju (11) Allen. F. H.; Kennard. O. Chem. Des. Autom. News. 1993, 8, 31-37. (12) Desiraju. G. R. Angew. Chem., Int. Ed. Engl. 1995, 34, 2312327; Sarma, J. A. R. P.; Desiraju, G. R. Polymorphism and Pseudopolymorphism in Organic Crystals: A Cambridge Structural Database Study. In Seddon, K.; Zaworotko, M. J., Eds.; Crystal Engineering, Kluwer: Norwell, MA, 1999; pp 325-356. (13) Moulton, B.; Zaworotko, M. J. Chem. Rev. 2001, 101, 16291658. (14) Lommerse, J.-P. M.; Motherwell, W. D. S.; Ammon, H. L.; Dunitz, J. D.; Gavezzotti, A.; Hofmann, D. W. M.; Leusen, F. J. J.; Mooji, W. T. M.; Price, S. L.; Schweizer, B.; Schmidt, M. U.; van Eijck, B. P.; Verwer, P.; Williams, D. E. Acta Cryst. 2000, B56, 697-714. (15) Karfunkel. H. R.; Gdanitz. R. J. J. Comput. Chem. 1992, 13, 1171-1183. (16) Verwer, P.; Leusen, F. J. J. Computer simulation to predict possible crystal polymorphs. In Lipkowitz, K. B.; Boyd, D. B., Eds.; Reviews in Computational Chemistry, 1998, 12, 327-365. (17) Bar, I.; Bernstein, J. Eur. Cryst. Meeting, 1982, 7, 182. (18) Indeed, a consideration of Table 1 would show that the structure derived from MOPRDB actually has a P. E of 200 cal/mol/molecule lower than the experimental structure that is derived from DTHPIM. This amount would be the minimum entropy difference between the two structures. (19) Desiraju, G. R. Science 1997, 278, 404-405; Desiraju, G. R. Nature 2001, 412, 397-400. (20) Sali, A.; Shakhnovich, E.; Karplus, M. Nature 1994, 369, 248. (21) Blundell, T. L.; Carney, D. P.; Gardner, S.; Hayes, F. R. F.; Howlin, B.; Hubbard, T. J. P.; Overington J. P.; Singh, D. A., Sibanda, B. L.; Sutcliffe, M. J. Eur. J. Biochem. 1988, 172, 513-520; Blundell, T. L.; Elliott, G.; Gardner, S.; Hubbard, T.; Islam, S.; Johnson, M.; Mantafounis, D.; Murray-Rust, P.; Overington J.; Pitts, J. E.; Sali, A.; Sibanda, B. L.; Sibanda, B. L.; Singh, J.; Sternberg, M. J. E.; Sutcliffe, M. J.; Thornton, J. M.; Travers, P. Philos. Trans. R. Soc. London 1989, B324, 447-471; Sutcliffe, M. J.; Hayes, F. R. F.; Carney, D. P.; Blundell, T. L. Protein Eng. 1987, 1, 377-384; Sutcliffe, M. J.; Hayes, F. R. F.; Blundell, T. L. Protein Eng. 1987, 1, 385-392.

CG015576U