Tailor-Made Force Fields for Crystal-Structure Prediction - American

Jul 22, 2008 - Marcus A. Neumann* ... reference data, the choice of the figure of merit, the optimization algorithm, and the parameter-refinement stra...
1 downloads 0 Views 1MB Size
9810

J. Phys. Chem. B 2008, 112, 9810–9829

Tailor-Made Force Fields for Crystal-Structure Prediction Marcus A. Neumann* AVant-garde Materials Simulation, 30b rue du Vieil AbreuVoir, 78100 St-Germain-en-Laye, France ReceiVed: NoVember 2, 2007; ReVised Manuscript ReceiVed: March 7, 2008

A general procedure is presented to derive a complete set of force-field parameters for flexible molecules in the crystalline state on a case-by-case basis. The force-field parameters are fitted to the electrostatic potential as well as to accurate energies and forces generated by means of a hybrid method that combines solid-state density functional theory (DFT) calculations with an empirical van der Waals correction. All DFT calculations are carried out with the VASP program. The mathematical structure of the force field, the generation of reference data, the choice of the figure of merit, the optimization algorithm, and the parameter-refinement strategy are discussed in detail. The approach is applied to cyclohexane-1,4-dione, a small flexible ring. The tailor-made force field obtained for cyclohexane-1,4-dione is used to search for low-energy crystal packings in all 230 space groups with one molecule per asymmetric unit, and the most stable crystal structures are reoptimized in a second step with the hybrid method. The experimental crystal structure is found as the most stable predicted crystal structure both with the tailor-made force field and the hybrid method. The same methodology has also been applied successfully to the four compounds of the fourth CCDC blind test on crystal-structure prediction. For the five aforementioned compounds, the root-mean-square deviations between lattice energies calculated with the tailor-made force fields and the hybrid method range from 0.024 to 0.053 kcal/mol per atom around an average value of 0.034 kcal/mol per atom. 1. Introduction Many physical and chemical properties of molecular crystals, such as density, color, solubility and dissolution rate, strongly depend on the molecular packing in the solid state. The prediction of the experimentally accessible crystal structures therefore is a prerequisite for the computation of important solidstate properties from the molecular structure as a starting point. Although crystal-structure prediction is an essential step toward the distant goal of in silico materials design, one close-by application is the prediction of the most stable crystal polymorph for a given molecule. Especially in the pharmaceuticals industry, formulations based on the crystal polymorph with the lowest free energy are generally preferred, because they provide the best long-term stability. However, the stable crystal polymorph may only show up experimentally several years after the first synthesis of a new drug compound because of unfavorable crystallization kinetics. The computational prediction of the stable form would greatly reduce the risk of unpleasant surprises during clinical testing and production scale up. The important potential applications of crystal-structure prediction have triggered a lot of work, and the state of the art has been assessed in a series of blind tests organized by the Cambridge Crystallographic Data Centre.1–3 The problem of crystal-structure prediction can be roughly divided into two challenges: the generation of all relevant low-energy crystal packings and the identification of the experimentally accessible crystal structures from the list of candidate structures. The second issue is generally tackled by lattice energy calculations, by assuming that it is the most stable crystal structures which are actually observed in nature. The blind tests clearly have revealed energy ranking as the main bottleneck for the crystalstructure prediction of rigid or slightly flexible molecules. * Corresponding author. E-mail: [email protected]. Phone: +33 6 25 05 33 29.

Indeed, many of the experimental crystal structures were generated by one or more participants, but whether or not these structures were selected as one of the three allowed submissions per participant often turned out to be a matter of luck, because the lattice energy differences between the most stable crystal structures are typically smaller that the accuracy of common methods for lattice energy calculations. In an attempt to solve the energy ranking problem, a hybrid method has recently been developed4 that combines density functional theory (DFT) calculations by means of the VASP program5–9 with an empirical van der Waals (vdW) correction fitted to molecular C6 coefficients and to the unit cells of lowtemperature crystal structures. When initially applied to the energy ranking of crystal packings generated with Accelrys’ Polymorph Predictor10 in the 17 most frequent space groups, the hybrid method performed excellently for a series of six small and essentially rigid molecules, namely acetylene, ethylene, ethane, methanol, acetic acid, and urea. In five out of six cases, the experimental crystal structure was found as the most stable predicted crystal structure and for ethylene, it came second. The high potential of the hybrid method was confirmed by an additional energy ranking study11 on the submitted and the experimental crystal structures of the first two CCDC blind tests. Out of seven considered experimental crystal structures, six turned up among the two most stable predicted crystal structures. The remaining ranking imperfections are not necessarily related to the inherent inaccuracy of the hybrid method. Zero-point vibrational energies and entropy effects are currently not taken into account, and the energy ranking may improve further if free energies at the experimental temperatures are calculated rather than lattice energies. In addition, the stable polymorph may not yet have been observed in some cases. For the remainder of this work, it will be assumed that the hybrid method provides the accuracy required for crystal-structure prediction.

10.1021/jp710575h CCC: $40.75  2008 American Chemical Society Published on Web 07/22/2008

Tailor-Made Force fields for Crystal-Structure Prediction On its own, the hybrid method provides only a limited solution to the energy-ranking problem in crystal-structure prediction. By using standard force fields for the initial crystalstructure generation, the number of crystal structures in the energy window between the experimental crystal structure and the most stable predicted crystal structure grows rapidly with the molecular flexibility. As a consequence, it becomes increasingly time-consuming to treat a sufficiently large number of crystal structures in the final energy-ranking step. With the hybrid method, the final energy ranking is essentially limited to 10-100 crystal structures. The CPU time and memory requirements for the DFT component of the hybrid method strongly grow with system size. For a crystal structure with a reduced cell volume of 3000 Å3, a full crystal-structure optimization from a reasonably good starting point takes about 2-4 weeks of CPU time on two state-of-the-art PCs working in parallel. Hence, improvements of the current force-field technology remain of vital importance to narrow down the number of crystal structures to be optimized with the hybrid method. Because of the high CPU-time requirements involved, the direct use of DFT calculations during structure generation, as recently demonstrated for several inorganic and organic compounds including urea,12 will not become feasible for most organic compounds of industrial interest for many years to come. Force-field parametrization by using experimental data and/ or ab initio calculations is generally believed to belong to the realm of some enlightened specialists, being far too complicated for every day users of molecular modeling software. Most molecular modeling studies rely on one of the published force fields such as Dreiding,13 CHARMM,14 Amber,15 Merck Molecular Force Field,16 CVFF,17 COMPASS,18 UFF,19 or others. Molecule-specific reparameterization, if carried out at all, is generally limited to the charge model. For the purpose of crystalstructure prediction, it is widely admitted that atomic charges should be fitted to the electrostatic potential around the molecule derived from ab initio calculations. It has also been shown that the accuracy of force-field calculations can be improved if electrostatic multipoles are used instead of point charges.20–23 Compared to the parametrization of other force-field parameters, the extraction of atomic point charges and/or electrostatic multipoles from an electronic wave function is relatively straightforward, but the reparametrization of the charge model only on a molecule-by-molecule basis suffers from serious methodological shortcomings. In fact, molecular and crystalline equilibrium structures result from a fine balance of many energy terms, including electrostatic and vdW interaction potentials as well as bond stretch, angle bend, and torsion and inversion terms for covalently bonded atoms. In carefully parametrized force fields, the various terms complement each other. If one type of interaction is reparametrized, so should the others. The Coulomb energy between two atoms on neighboring molecules, for instance, does not only depend on the atomic charges but also on the close-contact distance which is strongly influenced by the repulsive part of the vdW potential. Intramolecular flexibility is another tricky issue. For a correct description of different molecular conformations, the intramolecular Coulomb and vdW interactions must be carefully balanced by torsion and inversion terms. If only the electrostatic terms are modified, this balance is lost. It is the main purpose of this paper to suggest a paradigm shift in the use and development of force fields and their associated software. Instead of supplying the user with one or more transferable force fields parametrized once and for all for different classes of compounds, software tools should provide

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9811

Figure 1. Overall strategy for crystal-structure prediction.

an accurate method for the generation of reliable reference data and also a robust procedure for the refinement of force-field parameters on a molecule-by-molecule basis. The requirement of transferability shifts from the force-field parameters themselves to the methodology for parameter generation. The proposed overall procedure for crystal-structure prediction is summarized in Figure 1. For a given flexible molecule, the hybrid method serves as a gold standard for the generation of reference data, which are used to parametrize a tailor-made force field. By using this force field, crystal structures are generated and optimized. From the parametrization step, the accuracy of the force field is roughly known and can be used to delimit a candidate window from which experimentally observable crystal structures may come. All crystal structures in the candidate window are reoptimized and reranked with the hybrid method, by using a quasi-Newton optimizer and starting Hessians calculated with the tailor-made force field. Alternatively, the size of the candidate window may be calculated on the fly during the final energy ranking with the hybrid method (see eqs 13-15 in ref 4). The accuracy of the tailor-made force field affects the CPU-time requirements of the final energy ranking in three different ways. With increasing force-field accuracy, the size of the candidate energy window and thus the number of structures to optimize both decrease. In addition, the equilibrium crystal structures obtained with the force field, which serve as starting points for the final crystal-structure optimization, approach the energy minima according to the hybrid method. Furthermore, the accuracy of the starting Hessians improves. On the whole, the quality of the force-field parameter refinement has a very strong impact on the CPU-time requirements of the final energy-ranking step, and spending a significant fraction of the overall CPU time for a crystal-structure prediction study on force field refinement is definitely a wise move. At present, the mathematical framework of the tailor-made force fields is rather simple, involving Coulomb interactions between fixed atomic point charges calculated from bond increments, isotropic vdW potentials, and essentially uncoupled bonded energy terms. This choice of simplicity has been motivated by two different factors. From a pragmatic point of view, a simple mathematical framework can be expected to be good enough for molecules with a zero net charge, if the tailormade force fields are used only for structure generation but not for the final energy ranking. Second, force-field parametrization on a molecule-by-molecule basis is a difficult endeavor because of the large number of parameters involved, and it is preferable to first test the proposed parameter refinement strategy in conjunction with a relatively simple mathematical framework before moving on to more sophisticated energy terms with an increased number of parameters to refine. In this context, it is important to note that the added benefit of more complex

9812 J. Phys. Chem. B, Vol. 112, No. 32, 2008

Neumann

electrostatic charge models as explored by several authors20,21,23,24 is not well characterized, because the reparametrization of important, interdependent force-field parameters is virtually always missing. Despite the fact that the tailor-made force fields are based on the same approximations as other common force fields, their energy terms are different in many important respects, because the requirements for transferable and tailormade force fields are very different. In the following, frequent comparison will be made between the tailor-made force fields proposed here and the Dreiding13 force field, from which starting values for force-field parameters are borrowed whenever possible. In addition to a mathematical framework, force-field parameter refinement requires reference data, a deviation function (figure of merit), a refinement algorithm, and a recipe. All these ingredients are described separately in Sections 2-6. The hardware setup and some parallelization issues are briefly discussed in Section 7, followed by a case study on cyclohexane1,4-dione in Section 8. The validation study on cyclohexane1,4-dione exemplifies many but not all of the theoretical aspects developed in sections 2-6. The proposed methodology has been further tested by successful application to the four compounds of the fourth CCDC blind test on crystal-structure prediction.25 No detailed account of these additional validation studies is given here, because they are important projects in their own right. However, the accuracy of the tailor-made force fields obtained for the four additional compounds is briefly discussed in Section 9. In all validation studies, a structure-generation engine has been used that is still under development. The current prototype provides a quasi complete list of low-energy crystal packings, searching in all 230 space groups with one or two flexible molecules per asymmetric unit. A more detailed description of the structure-generation engine goes beyond the scope of this communication and will be published elsewhere. All algorithms described in the present paper, including the hybrid method and the structure-generation engine, have been implemented in the program GRACE (Generation Ranking And Characterization Engine).26 2. Mathematical Framework For a given crystal, the total energy per unit cell as a function of the atomic coordinates is approximated as the sum over Coulomb terms, vdW terms, hydrogen-bond terms, bond-stretch terms, angle-bend terms, overall-torsion terms, overall-inversion terms, and angle-bendsinversion coupling terms.

Ecell )

∑ i∈cell ∑ j∈crystal,j*i 21 ECoulomb,ij + ∑ i∈cell ∑ j∈crystal,j*i 21 EvdW,ij +

∑ i∈cell,i∈A ∑ j∈crystal,j∈H Ehb,ij + ∑ i∈cell ∑ j∈B(i) 21 Estretch,ij + ∑ j∈cell ∑ i∈B(j) ∑ k∈B(j),k*i 21 Ebend,ijk + ∑ j∈cell,j∉T ∑ k∈B(j),k∉T 21 Eoverall_torsion,jk + ∑ i∈cell,i∈N Eoverall_inversion,i + ∑ i∈cell,i∈N Eabic,i (1) 3

3

The factor 1/2 used under several sums avoids double counting. The notations i∈cell and i∈crystal indicate that the corresponding sums run over all atoms in the unit cell or in the crystal,

respectively. In the second case, appropriate truncation is required for practical implementation. The notations i∈A, i∈H, i∈B(j), i∉T, and i∈N3 specify that i is, respectively, a hydrogenbond acceptor (A), a hydrogen atom involved in hydrogen bonding (H), covalently bonded to an atom j (B(j)), not a terminal atom (T), or an atom with three covalently bonded neighbors (N3). Some of the terms under the sums in eq 1 may actually be switched off. For atoms with three covalently bonded neighbors, for example, either a single angle-bendsinversion coupling term or one overall inversion term together with three angle-bend terms is used. It is important to realize that there is a conceptual difference between the energy expression specified in eq 1 and a force field. A force field is a set of rules, functions, and parameters that may apply to many different model systems, whereas an energy expression results from the application of a force field to one particular model system. For instance, a force field for water may be used to set up energy expressions for an isolated water molecule, disordered water molecules in a simulation box, or ordered water molecules representing the crystal structure of ice. The following subsections expose the rules and functions of the tailor-made force fields from which energy expressions such as eq 1 are constructed. 2.1. Force-Field Atom Types. In transferable force fields, force-field atom types (FFATs) are used to limit the number of distinct interactions for which force-field parameters have to be specified. Chemically similar atoms, for instance, all atoms of the same element type and in the same hybridization state, are attributed the same FFAT, either manually or automatically by using well-defined typing rules. The energy terms of a force field are defined with respect to the FFATs; that is, for each type of interaction, the force field specifies not only the parameters and a mathematical function but also the FFATs of the atoms that participate in the interaction. The Dreiding force field, for example, distinguishes only four FFATs for carbon atoms, namely C_1, C_2, C_R, and C_3. Consequently, there is a maximum of 10 different bond-stretch terms for carboncarbon bonds and in practice even less because several bonds are described by the same parameters. A tailor-made force field is meant to describe the geometry and the energetics of a given molecule as closely as possible, and the limitation to a small number of FFATs and energy terms is undesirable. However, it would be counterproductive to attribute an individual FFAT to every atom, that is, to discard the concept of FFATs altogether. The energy terms of a force field must be defined such that the same total energy is found for equivalent molecular conformations, which can be achieved by attributing the same FFATs to equivalent atoms. For a molecule with a methyl group, for example, the total energy should be the same for the three conformations that can be obtained by rotating the methyl group through 120°. This constraint can be easily met by using the same FFATs for the three hydrogen atoms of the methyl group. Because the choice of appropriate FFATs conditions all subsequent work with a tailor-made force field, the generation of FFATs on a moleculeby-molecule basis has been automated. The applied rules, in particular with respect to atomic equivalence, will now be discussed. The names of most of the FFATs follow the scheme Xn_m_Ns, where X, n, m, N, and s are the element symbol, the number of covalently bonded neighbors, a running number, a user-defined name included in all FFATs belonging to the same molecule, and an inversion indicator (see below), respectively. For hydrogen atoms involved in hydrogen bonding, a

Tailor-Made Force fields for Crystal-Structure Prediction

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9813

Figure 3. Atom indices for the definition of the handedness (a) and the sp2-sp2 rotation (b) criterion in definition 1.

around a central atom with four covalently bonded neighbors does not change. In addition, rotations around sp2-sp2 double bonds are not possible in many cases. For computational applications, it is more convenient to transform the above statements to a set of conditions that can be checked against a given conformation of a molecule. Definition 1 (Atomic EquiWalence). Two atoms with indices i and j in a molecule are considered to be equivalent, if a permutation P within the indices of all atoms in the molecule exists that meets all of the following conditions. (a) Atomic match: i ) P(j). (b) Element equivalence: the atoms k and P(k) belong to the same element for all k. (c) Bond topology: if two atoms, k and l, are covalently bonded, so are the atoms P(k) and P(l), and vice versa. (d) Handedness: for atoms with four covalently bonded neighbors, the permutation does not change the handedness (see Figure 3a for atom indices). The handedness is maintained if the following condition is true.

sign{(r bn - b rk) · [(r bm - b rk) × (r bl - b rk)]} ) sign{(r bP(n) b rP(k)) · [(r bP(m) - b rP(k)) × (r bP(l) - b rP(k))]} (2)

Figure 2. FFATs generated automatically for some molecules. The FFAT in brackets for case b is obtained when the optional condition e of definition 1 is not applied.

In the above equation, b rk is the position of atom k in Cartesian coordinates. (e) sp2-sp2 rotations (optional): rotations around sp2-sp2 bonds (see Figure 3b for atom indices) frequently involve high potential-energy barriers and are not energetically possible. Permutations that do not involve rotations around sp2-sp2 bonds respect the following condition for each of these bonds.

sign{[(r bk - b rm) × (r bl - b rm)] · [(r bp - b rn) × (r bo - b rn)]} ) sign{[(r bP(k) - b rP(m)) × (r bP(l) - b rP(m))] · [(r bP(p) - b rP(n)) × (r bP(o) -

slightly different scheme of the type XY_m_Ns is used, where Y is the element symbol of the hydrogen-bond donor. Both naming schemes differ only with respect to the primary FFAT, which is defined as the beginning of the type name up to the first underscore. Some typical type names are shown in Figure 2. Two atoms, i and j, are considered to be equivalent and thus attributed the same FFAT, if there is an energetically possible atomic rearrangement that brings every atom in a molecule onto a position formerly occupied by another atom of the same element type, with atom j ending up at the former location of atom i. The notion of energetically possible rearrangement is related to the energy barriers that can be overcome in the molecular mechanics or dynamics calculations for which the FFATs are generated. If all atoms were allowed to move around freely, including bond breaking, all atoms of the same element type would have to be attributed the same FFAT according the above definition. In the energy range relevant for the structure prediction of molecular crystals, bond breaking and bond making can be neglected, and the handedness of the atomic arrangement

b rP(n))]} (3) Figure 2a shows the FFATs obtained for paracetamol according to the above rules. Figure 2b presents the FFATs generated for another molecule with and without the optional condition e. The concept of atom equivalence elaborated above requires one further extension to be genuinely satisfactory, as illustrated in Figure 2c. The atoms O2_0_c+ and HO_0_c+ are not equivalent to the atoms O2_0_c- and HO_0_c-, because the permutation that exchanges these atoms does not comply with condition d of definition 1. However, it is intuitively clear that there has to be a symmetry relationship between the energy terms for the two hydroxyl groups which should be reflected by the choice of appropriate FFATs. The two oxygen atoms and the two hydrogen atoms in Figure 2c are inversionequivalent, whereby two atoms are deemed inversion-equivalent if there is an inversion followed by an energetically possible rearrangement that brings every atom in a molecule onto a position formerly occupied by an atom of the same element,

9814 J. Phys. Chem. B, Vol. 112, No. 32, 2008 with atom j ending up on the former position of atom i. In mathematical terms, inversion equivalence is defined as follows. Definition 2 (Atomic InWersion EquiWalence). Two atoms with indices i and j in a molecule are considered to be inversionequivalent, if definition 1 is applicable with the exception of condition d, which now takes a slightly different form. (d’) Handedness: for atoms with four covalently bonded neighbors, the permutation changes the handedness (see Figure 3a for atom indices), which requires the following to hold.

Neumann TABLE 1: Molecular Typing Rule for the Molecule Shown in Figure 2ba

sign{(r bn - b rk) · [(r bm - b rk) × (r bl - b rk)]} ) -sign{(r bP(n) b rP(k)) · [(r bP(m) - b rP(k)) × (r bP(l) - b rP(k))]} (2a) If a molecular conformation can be mapped onto itself by a combination of an inversion and an energetically possible atomic rearrangement, force-field calculations should yield the same total energy for the two atomic arrangements. One may be tempted to ensure the expected result by attributing the same FFATs to inversion-equivalent atoms, but this approach only works if the potential-energy functions for torsions and inversions do not contain any uneven terms, because torsion angles and inversion distances change sign upon inversion (see Sections 2.7 and 2.8). To allow for uneven terms, the following has been implemented. If two atoms are inversion-equivalent but not equivalent, they are attributed FFATs that differ only at the last character, the inversion indicator s (see above) being set to + for one atom and to - for the other. If an atom does not have a counterpart that is inversion-equivalent without being equivalent, the inversion indicator is just an empty string. With respect to the definition of force-field energy terms, the inversion indicator is used as follows. If a bond-stretch term is defined for two covalently bonded atoms with FFATs t1 and t2, the same bondstretch term is also applied to two covalently bonded atoms with FFATs t1-1 and t2-1, where the exponent indicates an operation that inverts the inversion indicator if it is not an empty string. Similarly, angle-bend terms defined for three FFATs t1, t2, and t3 are also valid for FFATs t1-1, t2-1, and t3-1. For torsion terms, the situation is slightly different. If a torsion term is defined for FFATs t1, t2, t3, and t4, there also is an equivalent energy term for the FFATs t1-1, t2-1, t3-1, and t4-1, but the torsion angle is measured clockwise in the first case and counterclockwise in the second case. Inversion terms are dealt with in the same way as torsion terms; that is, the measurement direction is inverted for the inversion-equivalent FFATs. Figure 2d shows the FFATs obtained for cyclohexane-1,4-dione by using the concepts of atomic equivalence and inversion equivalence. The automatic generation of a typing rule is as important as the automatic generation of FFATs. Typing rules can be used in computer programs to automatically attribute FFATs to every atom in a crystal structure. The tailor-made force fields use molecular typing rules that are applied to groups of covalently bonded atoms. A crystal structure is first decomposed into molecules, that is, sets of atoms that contain all covalently bonded neighbors for each atom in the set. The algorithm then tries to number all atoms in each set in such a way that they meet one of the molecular typing rules of the tailor-made force field. A typical typing rule is presented in Table 1 for the molecule shown in Figure 2b, whereby the optional condition e has been applied. For every atom in a molecule, the typing rule specifies an atom index, i, the force field type, ti, the element, ei, a conformation indicator, σi, and the indices of all covalently bonded neighbors, bi,j, where j is an index running from 1 to the number of atoms bonded to i. The conformation indicator can only take the values -1, 0, and 1. For a typing

a

i

ti

ei

σi

bi,j

1 2 3 4 5 6 7 8 9

H1_0_b Cl_0_b F1_0_b C4_0_b C3_0_b O1_0_b N3_0_b HN_0_b HN_1_b

H Cl F C C O N H H

0 0 0 -1 1 0 -1 0 0

4 4 4 5321 746 5 589 7 7

See text for details.

rule to apply to a set of atoms, the atoms must be numbered such that each atom has the element type and the bond partners specified by the typing rule. For atoms with three or four covalent bond partners, additional criteria apply if the conformation indicator is not zero. An atom, i, with four bond partners must meet the handedness criterion.

σi(r bbi,4 - b rbi,1) · [(r bbi,3 - b rbi,1) × (r bbi,2 - b rbi,1)] > 0

(4)

Two covalently bonded atoms, i and j, with three bond partners each and nonzero conformation indicators must fulfill the sp2-sp2 rotation criterion.

σiσj[(r bbi,2 - b rbi,1) × (r bbi,3 - b rbi,1)] · [(r bbj,2 - b rbj,1) × (r bbj,3 b rbj,1)] > 0 (5) A force field should yield the same energy for a chiral molecule and its mirror copy, and a single molecular typing rule should cover both cases. Therefore, if a group of atoms cannot be typed with the straight typing rule, the inverse typing rule is also applied, in which the signs of all conformation indicators are inverted. Again, it must be remembered that torsion angles and inversion distances change sign upon inversion. Hence, for a chiral molecule typed with the straight typing rule and its mirror copy typed with the inverse typing rule, torsion angles and inversion distances must be measured in opposite directions. 2.2. Electrostatic Interactions. The electrostatic interaction between two atoms, i and j, is described in terms of a Coulomb interaction between fixed atomic point charges, Qi and Qj.

ECoulomb,ij ) SCoulomb,ij

1 QiQj 4πε0 Rij

(6)

Rij is the distance between the atoms, ε0 is the vacuum dielectric constant, and SCoulomb,ij is a scale factor which is 0 for next and second next neighbor interactions and equal to 1 in most other cases (see Section 2.10 for exceptions). The total electrostatic energy of a crystal is computed by means of Ewald summation with automatic parameter estimation. The handling of the atomic point charges merits some further comments. The atomic charges used with many force fields are not related to force-field parameters but determined by other methods and stored as properties of the structural model. This choice was unavoidable for the early force fields, where the number of FFATs is too small to allow for the accurate enough description of the molecular charge distribution. By using the procedure for the generation of FFATs outlined in the previous subsection, atomic charges can be treated as force-field parameters without loss of accuracy. In the tailor-made force fields, atomic charges are exclusively determined by the FFATS and defined via bond increments.18

Tailor-Made Force fields for Crystal-Structure Prediction

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9815

For every pair of covalently bonded atoms, i and j, with FFATs ti * tj, a bond increment δtitj must be defined which contributes the amounts + δtitj and -δtitj to the total charge of atom i and atom j, respectively. For neutral molecules, bond increments offer the same flexibility for the description of the molecular charge distribution as that for charges directly associated with FFATs but present the additional advantage of obeying the charge-neutrality constraint by definition. To make the concept of bond increments applicable to molecular salts, a further generalization is required. Consider a molecule with a COOH group that donates a proton to a molecule with an NH2 group and thus acquires a negative net charge. This charge is actually less than the formal charge of -1.0, because the positively charged NH3 group captures some of the electronic charge density around the negatively charged CO2 group. The charge transfer between the two groups can be described by introducing a charge-transfer increment for the oxygen and hydrogen atoms. A charge transfer increment δ˜ t1t2 can be defined for any two FFATs, t1 and t2, where the corresponding atoms do not have to be bonded. The charge-transfer increment transfers the charges ∆qt1 and ∆qt2 to every atom with the FFAT t1 and t2, respectively, with

∆qt1 ) +

Nt1 + Nt2 2Nt1 Nt1 + Nt2 2Nt2

˜ δ t1t2

(7)

˜ δ t1t2

(8)

Here Nt1 and Nt2 are the number of atoms with the corresponding FFAT in the unit cell. The use of eqs 7 and 8 ensures that the total charge of the unit cell remains zero regardless of the stoechiometry and the amount of charge transfer. If a bond increment or a charge-transfer increment is defined for two FFATs, t1 and t2, it is also valid for the inversionequivalent FFATs, t1-1 and t2-1. 2.3. vdW Interactions. To describe long-range dispersive interactions and hard core repulsions, isotropic atom-atom pair potentials of the exponential-6 type are used.

[

EvdW,ij ) SvdW,ij Atitj exp(-BtitjRij) + C6,titj

2 2 (C6,titiNeff,t )1⁄3 + (C6,tjtjNeff,t )1⁄3 j i

1 Rij6

]

(9)

The summation over pair potentials is carried out in direct space, and interactions beyond the direct space cutoff are taken into account by means of a continuum correction. Below the turning point, left to the minimum of eq 9, the 1/R6 term is replaced by a first order polynomial to avoid divergence at short distances, where the coefficients of the first order polynomial are chosen so as to yield continuous function values and first derivatives at the turning point. As for Coulomb interactions, SvdW,ij is a scale factor which is 0 for next and second next neighbor interactions and 1 in most other cases (see Section 2.10 for exceptions). The parameters A, B, and C must be defined explicitly for all homoatomic interactions. Parameters for heteroatomic interactions are either specified explicitly or derived from the homoatomic parameters by means of combination rules.

Atitj ) √AtitiAtjtj

(10)

1 Btitj ) (Btiti + Btjtj) 2

(11)

(12)

In eq 12, Neff refers to the effective electron numbers according to Halgren.27 For the C6 coefficients, the same combination rule is used as that used with the hybrid method.4,28 In general, the same vdW parameters, A and B, are used for all atoms of the same element, with exception of hydrogen atoms involved in hydrogen bonding that are dealt with separately. The C6 parameters depend on the primary FFATs only. Typically, starting values for the C6 parameters are taken from the hybrid method, and a single scale factor is used for all of them in the parameter refinement. Tests for several molecules have shown that the use of individual vdW parameters for every FFAT does not increase the overall accuracy of the tailor-made force field significantly. The explicit refinement of certain heteroatomic interaction parameters, on the other hand, strongly improves the overall accuracy. 2.4. Hydrogen-Bonding Terms. The tailor-made force fields allow for the definition of hydrogen-bond terms by using the same potential-energy function as the Dreiding force field.

[( ) ( ) ]

Ehb,ij ) Dtitjtk 5

and

∆qt2 ) -

C6,titj )

2 2(C6,t C2 N N )1⁄3 iti 6,tjtj eff,ti eff,tj

Rtitjtk Rik

12

-6

Rtitjtk

10

Rik

cos4(θijk)

(13)

Here, θijk is the bond angle between the hydrogen-bond donor k, the hydrogen j, and the hydrogen-bond acceptor i, and Rik is the distance between the donor and acceptor atoms. Dtitjtk is the depth of the hydrogen-bond potential, and Rtitjtk is the equilibrium distance. Experience gained from the parametrization of several tailormade force fields has indicated that special hydrogen-bond terms are not required as long as bond increments and vdW parameters,includingheteroatomicparametersforthehydrogensacceptor interaction, are optimized simultaneously. 2.5. Bond-Stretch Terms. Harmonic bond-stretch terms are evaluated for all pairs of covalently bonded atoms, i and j, with FFATs, ti and tj.

1 Eij ) Ktitj(Rij - Rtitj,0)2 2

(14)

Starting values for the parameters Ktitjtk and Rtitjtk,0 are taken from the Dreiding force field. 2.6. Angle-Bend Terms. Harmonic angle-bend terms are evaluated for all pairs of atoms, i and k, covalently bonded to a central atom, j.

1 Eijk ) Ktitjtk(θijk - θtitjtk,0)2 2

(15)

Here, θijk is the angle between the bonds i-j and j-k. Starting values for the parameters Ktitjtk and θtitjtk,0 are taken from the Dreiding force field. 2.7. Torsion Terms. In most transferable force fields, torsion terms are evaluated for all quadruplets of atoms i, j, k, and l connected via three adjacent covalent bonds, where each torsion term is an even function of the corresponding torsion angle, for instance,

1 Eijkl ) Vtitjtktl[1 - Stitjtktl cos(ptitjtktlφijkl)] 2

(16)

Here, V is the torsion barrier, S can take the values +1 and -1, and p characterizes the periodicity of the torsion potential. The torsion angle φijkl is measured clockwise when looking down the j-k bond (see Figure 4). Because there are several

9816 J. Phys. Chem. B, Vol. 112, No. 32, 2008

Neumann (see Figure 4), and must not be confused with the vdW parameters A and B in Section 2.3. The torsion energy as a function of the overall torsion angle is defined in terms of a Fourier series.

Eoverall_torsion ) E0 +

∑ cn cos[pn(φoverall - φn)]

(18)

n)1

Figure 4. Atom indices for overall torsions.

Figure 5. Atom indices for overall inversions and inversionsangle bend coupling.

different choices for the terminal atoms, the overall torsion potential for the rotation around a central bond results in general from the superposition of several different torsion terms and may have an uneven component. The limitation to even torsion terms offers the advantage that the sign change of the torsion angles upon inversion of the atomic coordinates must not be worried about. The presence of several torsion terms for every central bond is not a problem if the corresponding force-field parameters result from an educated guess or are fitted to a large body of data involving only a small number of FFATs. In the case of a tailor-made force field with its large number of FFATs, on the contrary, the number of independent torsion terms needs to be reduced to avoid a high level of redundancy, which would complicate the parameter refinement significantly. Therefore, a different strategy is adopted, where the torsion around a central bond is characterized by a single, overall torsion angle which depends on the coordinates of the two central atoms and all their covalently bonded neighbors. One of the individual torsion angles is chosen as the representative torsion angle, and all other individual torsion angles are projected onto the representative torsion angle. The overall torsion angle is the average over all projected torsion angles. A-1 B-1

φoverall )

360° ab∑ ∑ [Ρ]-180,180](φi jkl + 360° A B

1 AB a)0

b)0

a

For the sake of simplicity, the indices which indicate the dependence of the parameters E0, cn, p, and φn on the FFATs of the involved atoms have been omitted. Because there now is only a single torsion term for each central bond, the uneven component of the overall torsion potential must be included explicitly in the overall torsion term. Hence, the phase angles φn are in general different from 0 or 180°. The uneven component requires a special treatment of torsion terms related by inversion symmetry (see Section 2.1). Depending on the molecular symmetry, as reflected by the FFATs of the tailormade force field, the phase angles φn may have to be equal to 0 or 180°. This is the case when the FFATs of the external atoms obey the equalities tia ) tiA-a and tib ) tiB-b. The periodicity parameter p in eq 18 depends on the FFATs of the atoms i0, ..., iA-1 and l0,..., lB-1. Let pi and pl be the maximum number of identical subsequences that the sequences of FFATs ti0,..., tiA-1, and tl0,..., tlB-1can be decomposed into. Then, p is the smallest common multiple of pi and pl. In the case of nitromethane, for instance, the carbon atom has three external neighbors with identical FFATs (ti0 ) ti1 ) ti2). The sequence of these three FFATs can be decomposed into three identical subsequences containing only a single FFAT (pi ) 3). For the nitrogen atom, there are two external neighbors with identical FFATs (tl0 ) tl1) and two identical subsequences containing only a single FFAT (pl ) 2). The smallest common multiple of 2 and 3 is 6, which is the appropriate periodicity for the torsion potential of nitromethane. 2.8. Inversion Terms. In transferable force fields, inversion terms typically are an even function of a parameter that measures the out-of-plane motion of the central atom surrounded by three covalently bonded neighbors (see Figure 5). The Dreiding force field, for instance, uses inversion terms

Eijkl ) Ktitjtktl(1 - cos ψijkl)

where ψijkl is the angle between the i-j bond and the j-k-l plane. The limitation to even inversion terms is not always appropriate. Especially in larger molecules, the central atom may systematically prefer one particular side of the plane through its neighbors because of steric effects. Such a situation is difficult to parametrize if uneven terms are not allowed. Inversion terms such as the one shown in eq 19 have the additional disadvantage that the mathematical description of the inversion potential is only symmetric with respect to the three surrounding atoms if three individual inversion terms are taken into account for each central atom, a redundancy that needs to be avoided in a tailormade force field. The inversion potentials are parametrized in terms of a Taylor series of the signed center-to-plane distance (inversion distance) d, which is invariant with respect to a cyclic permutation of the surrounding atoms.

b

)

]

φi0jkl0 + φi0jkl0 (17) In the above equation, Ρ]-180,180] is a projection operator that adds a multiple of 360° to its argument such that the result falls into the interval ] -180°,180°]. A and B are the external number of covalently bonded neighbors of the atoms j and k, respectively

(19)

dijkl ) (r bi - b rj) ·

(r bl - b rj) × (r bk - b rj) |(r bl - b rj) × (r bk - b rj)|

(20)

∑ ct t t t ,ndijkln

(21)

Eoverall_inversion,i )

n)0

ijkl

For every central atom, there is only a single overall-inversion term. For the sign of the inversion distance to be well defined, the order

Tailor-Made Force fields for Crystal-Structure Prediction

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9817

of the surrounding atoms must be unambiguous; that is, the three neighbors of the central atom need to have different FFATs. If this is not the case, uneven terms are not allowed in eq 21. 2.9. Angle BendsInversion Coupling. It has been observed that the conformational flexibility around atoms with three covalently bonded neighbors is not always described sufficiently well by a combination of separate inversion and angle-bend terms. Therefore, a combined angle bendsinversion term has been introduced which allows for a more detailed description of the potential energy.

1 1 Eabic ) Einv(d) + k1(d)(R1 - R1,eq(d))2 + k2(d)(R2 2 2 1 R2,eq(d))2 + k3(d)(R3 - R3,eq(d))2 (22) 2 R3,eq(d) ) 360° - R1,eq(d) - R2,eq(d) (23) The terms Einv(d), k1(d), k2(d), k3(d), R1,eq(d), and R2,eq(d) are defined as Taylor series of the inversion distance d in analogy to eq 21. The angles R1, R2, and R3 are obtained by projection of the bonds i-j, i-k, and i-l onto the plane through the surrounding atoms (see Figure 5). If the FFATs of two or more surrounding atoms are identical, symmetry constraints apply to k1, k2, k3, R1,eq, and R2,eq. As in the case of overall torsions, the atomic indices and the reference to FFATs have been omitted. 2.10. Intramolecular Coulomb and vdW Scale Factors. In common force fields, intramolecular Coulomb and vdW interactions between first and second next neighbors are not taken into account. From third next neighbors onward, intramolecular interactions are evaluated in the same way as intermolecular interactions, where all third next neighbor interactions may sometimes be scaled down by a single scale factor. Experience with force-field fitting indicates that the full consideration of Coulomb and vdW interactions from third next neighbors onward is a valid choice for most atoms in a molecule, but there can be exceptions. In the case of a bulky side group attached to an aromatic ring, for example, it can happen that the contribution of the intramolecular Coulomb and vdW interactions to the force constant of the in-plane bend mode is bigger than the target force constant implicitly provided by the reference data, thus giving rise to a negative force constant for some of the angle-bend terms after parameter refinement. To avoid unphysical force-field parameters in cases like this, it is necessary to introduce a general approach for the scaling of intramolecular Coulomb and vdW interactions on a case-bycase basis. GRACE allows for the definition of one or more sequences of FFATs, where each sequence is accompanied by two scale factors for the Coulomb and the vdW interactions. For every chain of covalently bonded atoms in agreement with one of the sequences of FFATs, the corresponding scale factors are applied to the interactions between the terminal atoms of the chain. These scale factors correspond to SCoulomb,ij and SvdW,ij in eqs 6 and 9. The scaling is not necessarily limited to third next neighbor interactions and may also be used to obtain a more accurate description of intramolecular hydrogen bonds in cases where the hydrogen and the acceptor atom are many covalent bonds apart. 3. Reference Data An extensive set of reference data is required to determine appropriate values for the many force-field parameters described in the previous section. The hybrid method,4 which combines DFT calculations by means of the VASP program with an empirical vdW correction, is the natural choice for the generation

Figure 6. Grid points for the characterization of the electrostatic potential (Ri ) RvdWfinner, Ro ) RvdWfouter).

of reference data, because up to now, it is the only approach that has proven accurate enough for the lattice energy ranking of molecular crystals. VASP uses a plane-wave basis set and periodic boundary conditions. For the refinement of some forcefield parameters, in particular those of intramolecular energy terms, it would in principle have been more convenient to use a DFT code based on atomic orbitals and infinite boundary conditions. However, this option has been discarded, because it requires as a prerequisite the nontrivial task of deriving an empirical vdW correction for an isolated molecule DFT code at a high level of accuracy. The hybrid method produces three basic data types: electrostatic potential information, lattice energies, and lattice energy derivatives. On the basis of these basic data types, as shown below, it is possible to construct various data sets which are sensitive to different force-field parameters and intervene at different stages of the parameter refinement process. 3.1. Electrostatic Data. The electrostatic potential is calculated from the electronic wave function for a set of points in a unit cell. The points are taken from a regular grid with an approximate spacing, ∆, from which all points that are closer than rvdWfinner to any atomic nucleus and further away than rvdWfouter from all atomic nuclei (see Figure 6) have been removed. Symmetry-related grid points are represented by a single grid point with the appropriate multiplicity. Typically, values of ∆ ) 0.2 Å, finner ) 1.5, and fouter ) 2.5 are used. A factor of 1.5 for the inner radius is required, because the electrostatic potential must be evaluated outside the molecular electron cloud. In a densely packed crystal structure, there is hardly ever any empty space 1.5 vdW radii away from the atomic positions. Therefore, crystal structures are used that have been expanded such that the distance between atoms on neighboring molecules obeys the relationship

rij g (rvdW,i + rvdW,j)(1 + fexpansion)

(24)

During the expansion, the molecular geometries and the fractional coordinates of the molecular positions in the unit cell are held constant. The three factors finner, fouter, and fexpansion must be chosen consistently, and typically, fexpansion is set such that the shell of acceptable points around a molecule does not significantly overlap with the exclusion zone of a neighboring molecule, which leads to the inequality

rvdW,i fouter + rvdW,j finner e (rvdW,i + rvdW,j)(1 + fexpansion) (25) for all atoms i and j. With values for finner and fouter as mentioned above, fexpansion ) 1 is a reasonable choice if all atoms have

9818 J. Phys. Chem. B, Vol. 112, No. 32, 2008

Neumann

similar vdW radii, whereas fexpansion)1.5 should be used if the vdW radii are very dissimilar. Electrostatic data are calculated for all important conformations of a molecule. Prior to the calculation of the electrostatic potential, the molecular geometry is optimized with the hybrid method, where the position and orientation of the molecule as well as the parameters of the expanded unit cell are held constant. 3.2. Minimization Data. Rigid-molecule (internal degrees of freedom are ignored) minimization data are used for the parametrization of nonbonded interactions. In addition, minimization data for flexible molecules offer the possibility to feed the results of an energy ranking study with the hybrid method back into the parameter refinement. Each minimization data set consists of a series of crystal structures optimized under the same conditions and composed of the same molecule or the same set of molecules, such that lattice energies of all crystal structures are comparable. Throughout the remainder of this work, such a set will be called a similarity set. Minimization data are typically provided for several different similarity sets. Each minimization data set provides two different types of information: lattice energy differences between the optimized crystal structures and energy derivatives obtained for each crystal structure individually. Depending on the convergence criteria of the crystal-structure optimization, the energy derivatives calculated with the hybrid method are more or less close to zero. As shown in Appendix A, the integration of the energy derivatives into the deviation function requires a second-order derivatives matrix (Hessian matrix). A starting guess of the Hessian is calculated numerically by using a transferable force field prior to each optimization. After crystal-structure optimization with a quasi-Newton algorithm, an improved guess of the Hessian is available which is used for the construction of the deviation function. 3.3. Packing Data. Packing data are used for the refinement of nonbonded interactions. They are similar to minimization data but offer some additional features. Molecules are always treated as rigid bodies, and the molecular geometry must be the same for all structures in a similarity set. After each lattice energy minimization, the lattice energy and its first derivatives are evaluated for a set of points around the local minimum. For each degree of freedom (lattice deformations and whole molecule translations and rotations), two points are considered corresponding to the positive and the negative displacement direction. The distance with respect to the local minimum is chosen such that the potential energy increases by roughly

1 ∆E ) kTRT 2

(26)

where k is the Boltzmann constant and TRT is the room temperature. Hence, the potential-energy well is probed on a length scale which corresponds to the thermal occupation of the configuration space at room temperature. Quantum effects do not need to be considered, because the energy spacing of the vibrational modes involving whole molecule translations and rotations is sufficiently small. For the exploitation of packing data in the deviation function, a Hessian matrix is required, which is calculated numerically from the energy derivatives at the points around the potential-energy minimum. Packing data provide four different types of information: energies and energy derivatives at and around local minima. 3.4. Expansion Data. Expansion data are useful for the calibration of the depth of nonbonded potentials. For a set of crystal structures, every member is expanded by a certain

amount, where the molecular geometry and the molecular position in fractional coordinates are held constant. The lattice energy is evaluated for the original structure, the final structure, and if required, some intermediate configurations. 3.5. Conformation Data. Conformation data are used to derive parameters for bonded interactions that accurately describe the principal conformations of a given molecule, including the curvature around the local minima. The members of each conformation data set are the different conformations of one and the same molecule in a periodic cell with P1 symmetry. The cell size is chosen such that there is some empty space around the molecule. In a first step, the molecular geometry is optimized, keeping the molecular position and orientation as well as the cell parameters constant. The minimization is carried out with a quasi-Newton algorithm, and the updated second-order derivatives matrix resulting from the minimization is used for the calculation of approximate vibrational eigenmodes. For every eigenmode with an angular frequency ω, the lattice energy and the first derivatives are calculated at two points away from the minimum, where the distance is chosen to correspond to an energy increase equal to the average potential energy of the eigenmode () half of the total energy29) at room temperature.

1 1 + exp(-pω ⁄ kTRT) ∆E ) pω 4 1 - exp(-pω ⁄ kTRT)

(27)

Like packing data, conformation data provide energies and energy derivatives at and around local minima, and the potentialenergy wells are probed on length scales corresponding to thermal excitation, but this time, quantum effects including zeropoint motions are explicitly taken into account. Again, for the construction of the deviation function, a Hessian matrix is required which is calculated numerically from the energy derivatives around the minimum. 3.6. Wide Amplitude Data. Wide amplitude data characterize the energy barriers between molecular conformations. Each data set consists of a series of frames containing a single molecule in a large simulation box that gradually evolves from one conformation to another. Conformational changes are produced by adding additional restraint potentials to an energy expression. Restraint potentials are nothing else than normal torsion or inversion terms with very high force constants. For flexible rings, restraint potentials are defined for all ring torsions. To generate the various frames of a wide amplitude data set, the reference parameters of the restraint terms (target values for torsion angles or inversion distances) are changed stepwise from the values of one conformation to the values of another conformation. Each frame is first optimized with a force field, and a numerical second-order derivatives matrix is calculated. In a second step, the frame is optimized again with the hybrid method by means of a quasi-Newton algorithm which uses the Hessian matrix determined previously. For every frame, the lattice energy, the first energy derivatives, the Hessian matrix, and the restraint terms are stored. When a single degree of freedom is driven from one value to another, it may happen that another degree of freedom crosses an energy barrier and changes abruptly by a large amount, which leads to an energy discontinuity and complicates the interpretation of the wide amplitude data. Typically, restraint terms are defined for all relevant wide amplitude motions in a molecule. One of these constraint terms is driven to produce the wide amplitude data set, whereas the others are held constant to stabilize the rest of the molecule. For every frame, wide amplitude data provide

Tailor-Made Force fields for Crystal-Structure Prediction energies and energy derivatives that both include a contribution from the additional restraint terms. 3.7. Reverse Minimization data. This data type is used to remove false energy minima from the potential-energy hypersurface of the tailor-made force field. After a first parameter refinement, it may happen that the tailor-made force field predicts energy minima for molecular conformations or packings which, in reality, are unstable. Such false minima may be spotted by following calculations with the hybrid method or on the basis of personal experience. To remove false minima, energies and energy derivatives calculated with the hybrid method at the corresponding locations must be fed back into the parameter refinement. To generate reverse minimization data, some true and some false minima are selected, the corresponding structures are optimized with the tailor-made force field, and the Hessian matrix is calculated. In a second step, single-point energy calculations with the hybrid method are carried out to compute accurate energies and energy derivatives for the selected configurations. 3.8. Interdependencies between Reference Data Generation and Parameter Refinement. It is important to understand that the aforementioned data sets make it possible to split the force-field parameters into several groups which are refined separately. The order of the parameter refinement is of crucial importance. First, bond increments are derived from the electrostatic data. Then, the vdW parameters are fitted to rigid-molecule minimization data and expansion data, eventually complemented by packing data. No parameters for bonded interactions are required to fit these data sets, because the molecular geometries are kept constant within each data set. In a third step, bond increments and vdW parameters are optimized together by using electrostatic data, minimization data, packing data, and expansion data. In a fourth step, the bonded energy terms, that is, the bond-stretch terms, angle-bend terms, overall-inversion terms, overalltorsion terms, angle bendsinversion coupling terms, and intramolecular scale factors for Coulomb and vdW interactions, are fitted to conformation data only, with the bond increments and vdW parameters being held constant. The conformation data generated with the hybrid method contain a contribution from the intermolecular interactions between the translational copies of each molecule. In the force-field parameter refinement, this contribution is compensated by the Coulomb and vdW terms which have already been determined accurately at this stage. Because the description of Coulomb and vdW interactions is not perfect, a small error is introduced in the bonded energy terms that is related to the uncompensated part of the intermolecular interactions. At this point, the tailor-made force field is already fairly accurate, but the potential-energy barriers between molecular conformations are not yet well defined. The intermediate tailor-made force field is used for the force-field calculations involved in the generation of wide amplitude data. This choice minimizes the CPU-time requirements of the hybrid method, because the preoptimized frames and the corresponding numerical Hessians provide an excellent starting point. The lack of precision of the intermediate force field with respect to wide amplitude motions is of no concern, because the corresponding degrees of freedom are taken care of by the additional restraint terms. In a final refinement step, the bonded energy terms are fitted again, now taking into account conformation data and wide amplitude data. 3.9. Reference Structure Selection. The selection of rigidmolecule packings for the generation of minimization data

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9819 is a subtle issue which deserves a detailed discussion. It has already been pointed out in Section 2.3 that many heteroatomic vdW parameters should be refined explicitly. As a consequence, the reference data and in particular the rigidmolecule minimization data must be chosen such that all vdW potentials are well defined over the full range of intermolecular distances. Large interatomic separations are not a problem, because they occur in large numbers, but short contacts are rare events, and special care must be taken to ensure complete coverage. The objective to cover all possible short contacts is further complicated by the fact that most organic molecules consist of a majority of carbon and hydrogen atoms, complemented by a small amount of other elements such as oxygen, nitrogen, sulfur, fluorine, or chlorine. To improve the coverage of close contacts between these minority atoms, it is advisable to construct several smaller molecules which contain an increased portion of minority atoms and exhibit the same chemical motives as the full molecule. For the sake of simplicity, these molecules will be called molecular fragments in all subsequent discussions. In the parameter refinement, the same vdW parameters are used for the molecular fragments and the full molecule. If appropriate, the molecular fragments and the full molecule may also share some bond increments. Electrostatic data have to be generated for all molecular fragments. Using fragments for the purpose of parametrization is practically unavoidable when confronted with larger molecules but somewhat contrary to the spirit of this paper, because it assumes the transferability of force-field parameters between the fragments and the full molecule. However, transferability is assumed only for atoms with exactly the same chemical environment up to first or even second next neighbors. Currently, crystal packings are generated for all molecular fragments, all pairs of fragments, and the full molecule if it is not too large. The structure generation is carried out with rigid molecules in all space groups with no more than two symmetry copies per reduced cell (P1, P1j , P2, P21, Pm, Pc, C2, Cm, Cc). The structure-generation algorithm requires a force field for nonbonded interactions. A combination of bond increments and vdW parameters is used, where the bond increments are fitted to electrostatic potential data and the vdW parameters are taken either from the Dreiding force field or from a previous force-field parameter refinement for a related molecule. All generated crystal packings result from a rigid-molecule optimization. It would be far too time-consuming to prepare minimization data for all generated crystal structures. Instead, it is necessary to select as small a set of structures as possible that covers all possible close contacts. An automatic selection procedure has been implemented, which works as follows. Around every atom, several sectors are defined according to the number of covalently bonded neighbors (see Figure 7). All distances between atoms on different molecules are divided into categories, where every category consists of the primary FFAT of a first atom, one of its sectors, and the primary FFAT of a second atom. In a first step, the selection algorithm runs through all crystal structures and determines the shortest interatomic separation, rmin,ζ, for every distance category ζ. In a second step, the algorithm runs through all crystal structures again and removes the current structure if for every distance category, there are at least n remaining crystal structures with a close contact

rζ e rmin,ζ(1 + t)

(28)

A typical parameter choice is n ) 2 and t ) 0.05. The result of the above procedure strongly depends on the order in which

9820 J. Phys. Chem. B, Vol. 112, No. 32, 2008

Neumann every atom in a crystal structure is in close contact with other atoms on neighboring molecules. The need for the reference data to cover all relevant atomic configurations is not limited to minimization data but applies to all data types. Electrostatic data and conformation data must be generated for all relevant conformations of a molecule. Wide amplitude data are required for all relevant conformational changes. At present, the responsibility for the generation of all required data sets lies with the user, but a conformer analysis tool has been implemented to facilitate the task. 4. Deviation Function

Figure 7. Sector definitions for close contact analysis.

the various crystal structures are dealt with. All crystal structures are classified with respect to a risk factor, where structures with a high risk factor are removed first. The risk factor reflects the probability of a large structural change occurring when the selected structures are optimized with the hybrid method to produce rigid-molecule minimization data. Each selected crystal structure is supposed to cover a certain number of close contacts, and a large structural change is likely to invalidate the very reason for which the structure was selected. An appropriate quantitative measure of the risk factor is provided by the inverse of the number of times that a structure is found in a random structure generation process, which is an indicator for the size of the attractive basin. 3.10. Additional Issues. If the rigid-molecule minimization data contain only crystal structures optimized under ambient pressure, the resulting vdW potentials are in general inappropriate for the description of intramolecular third next neighbor interactions. Indeed, intramolecular third next neighbor distances are significantly smaller than typical intermolecular distances under ambient pressure. To improve the description of third next neighbor interactions, rigid-molecule minimization data are generated at ambient pressure and also at 5 GPa. At this pressure, the closest intermolecular distances are roughly equal to typical intramolecular third neighbor distances. Rigid-body minimization data and packing data can both be used for the refinement of nonbonded interactions. Packing data provide a more detailed characterization of each crystal packing but also require significantly more time to be generated. If the number of distance categories to be considered is small, packing data are the preferred choice. In most cases, however, the number of distance categories is rather high, and rigid-body minimization data are the only practical option, because a large number of crystal structures is required to cover all relevant close contacts. Compared to the alternative option of fitting nonbonded interaction potentials to reference data for bimolecular complexes, minimization data present the advantage of a significantly enhanced signal-to-noise ratio, because virtually

For force-field parameter refinement by means of a computer algorithm, it is necessary to define a mathematical function that characterizes the disagreements between the various reference data types and their force-field counterparts by a single number. This deviation function, D, needs to be constructed in such a way that it is 0 for a perfect match and >0 otherwise. The problem with the construction of the deviation function is that it has to combine a multitude of disparate data types (electrostatic potential, energies and forces at the potential energy minima and at surrounding points, data for densely packed and expanded crystal structures, and data for the full molecule and smaller fragments) in a balanced way such that no data type dominates the others. In order to assess the contribution from the various data types on a common scale, all disagreements are expressed in terms of energy errors per atom. Whenever possible, the energy errors are calculated in such a way that there is a direct link with the lattice energy ranking of molecular crystals. However, when reading through the description of the various terms of the deviation function presented in Appendix A, it is important to keep in mind that the main objective is not to calculate accurate physical properties but to compare reference data and their force-field counterparts in a balanced way. Therefore, the derivation of many terms of the deviation function starts from physical principles which are then modified or combined as is best for capturing the differences between energies and energy derivatives calculated with the fitted force field and the hybrid method. The dimensionless deviation function is the weighted average over several contributions from different energy errors:

[

ees(q b) em,1(q b) 1 w + wm,1 2 + W es e2 e

D(q b) )

2

2

ref

ref

wm,2

e2p,1(q b) em,2(q b) + wp,1 2 + eref e ref

wp,2

ep,2(q b) eref

e2c,1(q b) wc,1 2 eref wwa,1

e2p,3(q b) + wp,3 2 eref + wc,2

2 ewa,1 (q b)

e2ref

ec,2(q b) eref

+ wwa,2

+ wp,4

ep,4(q b) e2e,1(q b) + we,1 2 + eref e ref

e2c,3(q b) + wc,3 2 eref

+ wc,4

ec,4(q b) + eref

2 ewa,2(q b) erm,1 (q b) + wrm,1 2 + eref e ref

]

erm,2(q b) e2constr(q b) + wconstr (29) wrm,2 2 eref e ref

Tailor-Made Force fields for Crystal-Structure Prediction

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9821

W ) wes + wm,1 + wm,2 + wp,1 + wp,2 + wp,3 + wp,4 + we,1 + wc,1 + wc,2 + wc,3 + wc,4 + wwa,1 + wwa,2 + wrm,1 + wrm,2 + wconstr (30) The indices of the energy errors, etype,n or etype, used in the above equation, have the following meaning: type can be es ) electrostatic potential, m ) minimization data, p ) packing data, c ) conformation data, e ) expansion data, wa ) wide amplitude data, rm ) reverse minimization data, or constr ) constraint terms. The additional subtype index, n, takes values from 1 to 4, which indicate, respectively, energies at potentialenergy minima, forces at potential-energy minima, energies around minima, and forces around minima. The vector b q represents the refinable force-field parameters. Some of the terms in eq 29 are proportional to the energy error, whereas others are proportional to its square (see Appendix A.2 for an explanation). To combine both types of terms in a single expression, they are divided by a reference energy, eref, or the square of the reference energy, respectively. The reference energy, eref, can be interpreted as the target energy error per atom. A value of eref ) 0.01 kcal/mol is in general appropriate and yields a value close to 1 for the deviation function after parameter refinement. The parameters wtype,n or wtype are userdefined weights. In principal, the deviation function is constructed in such a way that all weights can be set to 1, but as is discussed in Section 8, there are valid reasons to reduce the importance of certain terms in some cases. The interested reader will find a detailed description of the various terms of the deviation function in Appendix A. 5. Parameter-Optimization Algorithm In general, force-field parameters are strongly coupled, and an optimization algorithm for force-field parameter refinement must be able to find its way through long, curved valleys with steep sides in a parameter space of more than 100 dimensions. A two-step procedure is used, consisting of a Powell optimization and a quasi-Newton optimization with numerical first- and second-order derivatives. A major problem for force-field parameter refinement are directions in the parameter space along which several forcefield parameters can change substantially without having a significant impact on the value of the deviation function. The two possible origins of such soft directions are incompleteness of the reference data and redundant force-field energy terms. It is frequently not possible to eliminate soft directions simply by the generation of more reference data or by the elimination of all redundant energy terms, because the data generation may be too time-consuming or because redundant energy terms allow for a mathematical structure of the force field that is convenient to use. Soft directions, if not dealt with appropriately, typically result in unphysical force-field parameters, that is, values that deviate from commonly admitted parameter ranges for bond distances, angles, force constants, torsion barriers, and other parameters. Unphysical parameters resulting from true redundancies are in principle not a problem, because the sum of all energy terms should yield an appropriate description of the potential energy in this case. However, unphysical parameters may also result from poor reference data or lack of convergence, in which case energy calculations for structures outside the training set may give disastrous results. Hence, physically reasonable parameter ranges should always be imposed as a safeguard against severe energy errors.

The notion of physically reasonable values is not limited to individual parameters but should also be applied to parameter sequences and certain groups of parameters. Different strategies are used to deal with these categories. For every individual forcefield parameter, a lower bound and an upper bound are defined, where the bounds must be chosen generously to accommodate the full range of admissible values. In cases where sequences of force-field parameters are expected to exhibit a certain regularity, these parameters can be defined as a function of a reduced number of other parameters which are adjusted in the parameter refinement. For instance, one may impose that the homoatomic vdW parameters Bii (see eq 9) are a linear function of the number of valence electrons, nval,i, for the elements C, N, O, and F

Bi,i ) a + nval,ib

(31)

where a and b are the adjustable parameters in the parameter refinement. For certain groups of parameters, the relationships to be satisfied may be too complex to be dealt with by means of a user-defined function. One particular important example are the reference angles, θ0, of the angle-bend terms around a common central atom with four covalently bonded neighbors. In this case, only five out of the six angles formed by the covalent bonds are required to describe the atomic arrangement around the central atom. If all six reference angles are allowed to vary independently, the reference angles frequently move away substantially from the ideal tetrahedral angle of 109.5°, giving rise to a situation where the overall geometry results from the equilibrium between six highly strained angle-bend terms. Most of the strain can be avoided by forcing the six reference angles to take values that correspond to a geometrically possible atomic arrangement around the central atom, and a special restraint term has been added to the deviation function for this purpose (see Appendix A.8). Planar arrangements of a central atom with three covalently bonded neighbors and planar rings are dealt with in a similar fashion, even though the use of a user-defined function is a valid alternative in this case. 6. A Recipe The amount of detail exposed in the previous sections may not permit an easy understanding of the work flow involved in the elaboration of a tailor-made force field. A summary of the overall procedure in the form of a recipe, which is applied to cyclohexane-1,4-dione in Section 8 is given below. The recipe is valid for a flexible molecule that is small enough to allow for the explicit consideration of all molecular conformations. The molecule is divided into rigid fragments to obtain a better coverage of intermolecular close contacts. 1. Build the molecule and molecular fragments, if required. 2. Generate FFATs and molecular typing rules. The molecule and the fragments either share some primary FFATs or some full FFATs. 3. Perform a conformer analysis of the molecule by using a standard transferable force field. 4. Optimize the geometries of all conformers (of the full molecule) and of the fragments with the hybrid method by using a large simulation box with periodic boundary conditions. If some conformers are identical after the optimization, the duplicates have to be discarded. One may also discard energetically unfavorable conformations that have no chance to be observed experimentally. 5. Generate electrostatic data for all conformations and fragments.

9822 J. Phys. Chem. B, Vol. 112, No. 32, 2008 6. Fit bond increments to the electrostatic data. The molecule and the fragments may share some bond increments. 7. Take vdW parameters from a standard force field or from a previous parameter refinement. In conjunction with the bond increments from Step 6, a rough force field suitable for rigid molecules has been created. 8. Generate rigid-molecule crystal packings at 0 GPa for each fragment, each pair of fragments, and selected conformations of the full molecule in all space groups with no more than two symmetry copies in the reduced cell. 9. Select an appropriate number of structures from the crystal packings generated in Step 8 to cover all relevant close contacts. 10. Generate rigid-body minimization data for all structures selected in Step 9. 11. Repeat Steps 8-10 for a pressure of 5 GPa. 12. Generate expansion data for some or all of the structures optimized in Step 10. 13. Fit the vdW parameters to rigid-body minimization data (Steps 10 and 11) and expansion data (Step12). 14. Simultaneously fit the bond increments and vdW parameters to electrostatic data (Step 5), rigid-body minimization data (Steps 10 and 11), and expansion data (Step12). The tailormade force field is now fully optimized for intermolecular interactions. 15. Generate conformation data for all molecular conformations. 16. Fit the bonded force-field parameters to conformation data (Step 15). 17. Generate wide amplitude data for all relevant conformational changes. 18. Fit the intramolecular force-field parameters to conformation data (Step 15) and wide amplitude data (Step 17). The tailormade force field is now fully optimized for intra- and intermolecular interactions. 19. Perform a conformer analysis with the tailor-made force field. If you find more conformations than at the end of Step 4, special action is required which will be discussed below. 20. Generate crystal structures in P1 or in all space groups with no more than two symmetry copies in the reduced cell, optimize the structures with the tailor-made force field and the hybrid method, and compare the lattice energies. This step provides you with a first, rough estimate of the overall accuracy of the tailor-made force field. If the conformer analysis in Step 19 yields additional conformations, all new conformations have to be optimized with the hybrid method. If the hybrid method confirms that a new conformation corresponds to a local minimum of the potentialenergy hypersurface, additional conformation data and wide amplitude data must be generated accordingly. In the opposite case, reverse minimization data need to be generated to flag the new conformation as a false local minimum. After all new conformations have been dealt with, the procedure continues with Step 18, taking into account the additional reference data as well as the conformation data and wide amplitude data already generated previously. Sometimes, it may be appropriate to feed the minimization data implicitly generated in Step 20 back into the parameter refinement. Because the lattice energies of flexible molecules are sensitive to all force-field parameters, it is necessary to refine all parameters simultaneously to all available reference data in this case. For highly flexible molecules, an extension of the above recipe is required, because it is practically impossible to consider all molecular conformations and all pathways between conformations explicitly. Instead, the full molecule has to be decom-

Neumann posed into smaller, flexible, overlapping, and chemically reasonable subunits, where the complexity of each subunit must allow for the complete characterization of its flexibility. Each of the subunits is then parametrized by following the above recipe, and in the end, the tailor-made force field of the full molecule is constructed from the tailor-made force fields obtained for the subunits. 7. Computer Hardware The calculations described in the next section have been carried out on a self-built LINUX PC cluster with 16 nodes, each consisting of a single Pentium IV processor running at 2.8-3.2 GHz, an ASUS P4P800-E motherboard, generic PC3200 memory with a total capacity of 2-3 GB, and a 40 GB hard disk. The nodes are connected via a standard gigabit ethernet. The cluster is managed by using the ROCKS 4.0.0 cluster toolkit.30 The time-consuming parts of the calculations involve the optimization of tens or hundreds of crystal structures with the hybrid method. In parallel computing terms, the tasks are of the embarrassingly parallel type, because it is sufficient to dispatch the structure optimization jobs over the available nodes, where each node independently deals with the optimization of a single structure. Hence, the system performance scales about linearly with the number of nodes. Strictly speaking, this is only true for crystal structures with reduced cell volumes up to approximately 1500 Å3. Above, the memory requirements of the VASP DFT code exceed the amount of memory available on a single node with a 32 bit operation system. In the range of 1500-3000 Å3, VASP is run in parallel on two-four nodes. On the hardware described above, it is difficult to go beyond reduced cell volumes of 3000 Å3. More than four nodes would be required to satisfy the overall memory requirements, but the system scales poorly beyond two nodes because the gigabit ethernet does not offer enough bandwidth. The practical limit for the approach presented in this paper clearly lies much higher than 3000 Å3, because 64 bit nodes with 10Gb ethernet connections and plenty of memory are nowadays readily available. 8. A Case Study: Cyclohexane-1,4-dione Cyclohexane-1,4-dione is an example for a flexible ring system. In its category, cyclohexane-1,4-dione is fairly challenging, because the molecular conformation in the experimental crystal structure31 measured at 133 K is distorted significantly with respect to the corresponding gas-phase conformation, as shown in Figure 8. The FFATs generated for cyclohexane-1,4-dione are shown in Figure 2d. Because the molecule is rather small and contains at least two copies of every atom type, molecular fragments are not required. Electrostatic data were generated for conformations 1a and 2a in Figure 8. Rigid-body minimization data at ambient pressure and 5 GPa, as well as expansion data, were prepared for 34 different crystal packings. These data were used to fit 14 vdW parameters and three bond increments. A relatively small weight of 0.1 and 0.02 was used for the pressure data and the expansion data, respectively. The simple nonbonded potentials of the current force-field implementation do not allow for an accurate description of lattice energies over a large density range. Pressure data and expansion data are required to obtain a balanced parameter set but must be given a small weight to

Tailor-Made Force fields for Crystal-Structure Prediction

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9823

Figure 9. Frames from the wide amplitude data set for the transition 2a f 2b. All molecules are contained in a simulation box with periodic boundary conditions which is not shown.

Figure 8. Conformations of cyclohexane-1,4-dione. The gas-phase conformations 1a and 1b as well as 2a and 2b are symmetry-equivalent.

avoid a negative impact on the energy accuracy of the tailormade force field with respect to lattice energy ranking at ambient pressure. Conformation data were generated for conformers 1a and 2a, and wide amplitude data were produced for the transitions 1a f 1b, 1a f 2a, and 2a f 2b with 17 frames per transition. In order to illustrate what wide amplitude data look like, Figure 9 shows five out of 17 frames for the transition 2a f 2b. All molecules are contained in a simulation box (not shown) with periodic boundary conditions. The conformation data and wide amplitude data were used to fit 8 bond-stretch parameters, 12 angle-bend parameters, 6 overall-torsion parameters, and 1 overall-inversion parameter. Other intramolecular terms and parameters were not required. The tailor-made force field for cyclohexane-1,4-dione is submitted as Supporting Information. With the tailor-made force field, crystal structures were first generated in space group P1 alone. The six obtained crystal structures were reoptimized with the hybrid method, resulting

in a root-mean-square deviation of 0.47 kcal/mol per molecule between the energy ranking with the tailor-made force field and that with the hybrid method. Then, a quasi-complete structure generation was carried out in all 230 space groups with one molecule per asymmetric unit, yielding 8 and 44 structures in an energy window of 0.5 and 1.0 kcal/mol, respectively. The structure generation engine produces crystal structures that are fully optimized with the tailor-made force field. The most stable generated structure turned out to be the experimental structure. The 57 most stable generated crystal structures from an energy window slightly larger than 1.0 kcal/mol were reoptimized with the hybrid method. Again, the experimental crystal structure was found as the most stable predicted crystal structure. Figure 10 compares the lattice energies obtained with the tailor-made force field and the hybrid method. On average, both methods differ by σE ) 0.53 kcal/mol per molecule. Comparing this value to the lattice energy spacing just above the most stable crystal structure, it becomes clear that some luck was involved in the successful ranking with the tailor-made force field, because in the list of lattice energies calculated with the hybrid method, there are three structures within only one standard deviation of the energy error, σE. The experimental crystal structure and the corresponding structures predicted with the tailor-made force field and the hybrid method are compared in Table 2. The experimental structure crystallized in space group P21, with unit cell parameters a ) 6.65 Å, b ) 6.21 Å, c ) 6.87 Å, and β ) 99.82°. Table 2 presents the relative differences of the lattice parameters, the unit cell deformation ∆ as defined in ref 4, and the root-mean-square deviation of the Cartesian coordinates of

9824 J. Phys. Chem. B, Vol. 112, No. 32, 2008

Neumann the additional minimization data concentrate on the most stable crystal packings which share a certain number of favorable packing motifs. The 26% difference probably is the price to pay for an equilibrated coverage of a large spectrum of molecular arrangements. In some cases, it may be appropriate to run two full crystal-structure predictions, one with the initial tailor-made force field and one with the improved tailor-made force field obtained after feedback of the results from the first prediction into the parameter refinement. Such a procedure may increase the overall chance of finding the most stable crystal structure. Overall, the study on cyclohexane-1,4-dione required about 14 days of CPU time at full load on the cluster described in Section 7. 9. Accuracy of Tailor-Made Force Fields

Figure 10. Lattice energies found for cyclohexane-1.4-dione with the tailor-made force field and the hybrid method.

non-hydrogen atoms. According to the last two columns, which contain the most pertinent indicators, the result of the crystal structure optimization with the hybrid method matches the experimental crystal structure fairly well, whereas the disagreements are about 2 times larger for the tailor-made force field. Figure 11 shows a superposition of the experimental crystal structure and the most stable predicted crystal structure according to the hybrid method. With respect to the main focus of this work, it is important to check whether the recipe described in Section 6 results in an overall accuracy close to the optimum within the limits of the chosen mathematical model. In order to investigate this issue, 29 of the 57 crystal structures optimized with the hybrid method were fed back into the parameter refinement as minimization data. Minimization data for fully flexible crystal structures close to the global energy minimum can be expected to be the best possible data to parametrize a force field dedicated to crystalstructure prediction. By using the additional minimization data, two different parameter refinements were carried out. In one case (weak feedback), the additional minimization data were given approximately the same weight as the original rigidmolecule minimization data. In a second refinement (strong feedback), the weight of the additional minimization data was increased by about a factor of 5 compared to all other data types. After parameter refinement, the remaining 28 crystal structures not included in the refinement as minimization data were optimized with the two new variants of the tailor-made force field. For the force field according to the standard recipe and the two new variants, a comparison with the lattice energies from the hybrid method for the 28 remaining structures yields rootmean-square deviations of 0.54 kcal/mol (recipe), 0.46 kcal/ mol (weak feedback), and 0.40 kcal/mol (strong feedback). If the weight of the additional minimization data is increased even further, the energy error starts to increase again because the force-field parameters are not sufficiently well defined by the additional minimization data alone. Compared to the standard recipe, even strong feedback reduced the energy error by only 26%. This is not a lot, taking into account that the recipe involves a large spectrum of different crystal packings, whereas

At the time of writing, the recipe outlined in Section 6 and exemplified in Section 8 has been applied to four additional test cases, namely compounds XII, XIII, XIV, and XV of the fourth CCDC blind test on crystal-structure prediction.25 Table 3 lists the elements contained in each compound together with the energy deviation per atom between each tailor-made force field and the hybrid method. It turns out that the accuracy of the tailor-made force fields spans a range from 0.024 to 0.053 kcal/mol per atom around an average value of 0.034 kcal/mol per atom. To avoid misinterpretation, it is important to comment on the methodology employed for the assessment of the force-field accuracy. The lattice energy of each crystal structure resulting from the structure generation procedure is fully optimized with respect to the tailor-made force field. All generated crystal structures are unique, and the 25-100 most stable generated structures are reoptimized with the hybrid method. Then, two sets of lattice energies obtained with the tailor-made force field and the hybrid method are compared, and the energy offset between the two energy scales as well as the root-mean-square deviation between the lattice energies are calculated. However, some crystal structures need to be excluded from the comparison for the reason that is illustrated in Figure 12. It happens that structures generated with the tailor-made force field correspond to local minima separated from deeper minima by a small energy barrier only, which disappears when switching from the forcefield potential-energy hypersurface to the hypersurface of the hybrid method. Minimizing these structures again with the hybrid method can result in large energy changes, even though the potential-energy hypersurfaces of the tailor-made force field and the hybrid method are fairly similar. Therefore, each crystal structure is optimized three times: once with the tailor-made force field during structure generation, then with the hybrid method, and finally with the tailor-made force field again. If the second optimization with the tailor-made force field does not lead back to the crystal structure obtained after the first optimization with the tailor-made force field, the structure is excluded from the lattice energy comparison. Typically, 5-10% of the considered structures have to be excluded. 10. Conclusions Together with the hybrid method4 and a new structure generation engine, the methodology for the parametrization of force fields on a molecule-by-molecule basis described in this paper presents a major breakthrough in the field of crystalstructure prediction. The excellent result obtained for cyclohexane-1,4-dione is representative for the overall performance of the approach, as confirmed by the outcome of the fourth

Tailor-Made Force fields for Crystal-Structure Prediction

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9825

TABLE 2: Comparison of the Calculated and the Experimental Crystal Structures for Cyclo-hexane-1,4-dionea structure 1

structure 2

∆arel [%]

∆brel [%]

∆crel [%]

∆βrel [%]

∆ [%]

σ{rb} [Å]

experimental experimental force field

force field hybrid method hybrid method

-0.12 0.01 0.12

-0.29 1.05 -0.76

-0.21 1.44 1.64

2.75 0.37 -2.38

5.07 2.65 5.16

0.093 0.039 0.101

a For each pair of structures, ∆arel, ∆brel, ∆crel, and ∆βrel are the relative changes of the lattice parameters. ∆ is the unit cell deformation according to ref 4, and σ{rb} is the RMSD of the Cartesian coordinates of the non-hydrogen atoms.

Figure 11. Superposition of the experimental crystal structure of cyclohexane-1,4-dione and the most stable predicted crystal structure according to the hybrid method.

TABLE 3: Accuracy of Tailor-Made Force Fields Obtained by Comparison with the Hybrid Method compound

elements

accuracy [kcal/mol per atom]

cyclohexane-1,4-dione compound XII compound XIII compound XIV compound XV

H, C, O H, C, O H, C, F, Cl, Br H, C, N, S H, C, N, O

0.034 0.025 0.053 0.024 0.032

CCDC blind test on crystal-structure prediction published elsewhere. 25 Although substantial progress has been made with respect to the crystal-structure prediction of small, moderately flexible organic molecules, further methodological and algorithmic

Figure 12. Barrier loss results in overestimation of the force-field error (see text for details).

improvements are required to extend the applicability of the approach to several molecules per asymmetric unit, highly flexible molecules, salts, solvates, and cocrystals. The number of structures found in a given energy window strongly increases with the number of structural degrees of freedom. For dealing with more and more complex compounds, it is crucial to further improve the accuracy of the tailor-made force fields in order to keep the number of structures to be optimized with the hybrid method at a minimum. The accuracies obtained for several molecules with atomic point charges, isotropic vdW interactions, and essentially uncoupled bonded terms provide a solid benchmark against which the added benefit of more complex concepts such as electrostatic multipoles, atomic polarizabilities, chemical potentials, or anisotropic vdW interactions can be tested. The parametrization strategy described in this paper can be easily generalized toward more complex mathematical models with an increased number of parameters, thus allowing for the assessment of the true potential of more sophisticated forcefield implementations. In addition to improved accuracy, further automation of the parametrization procedure is the second most important challenge. The current degree of automation requires the involvement of an expert user who chooses the reference data to be generated, selects the force-field parameters to be refined, and executes individual computational tasks. For the widespread

9826 J. Phys. Chem. B, Vol. 112, No. 32, 2008

Neumann

application of tailor-made force fields, complete automation needs to be achieved, such that tailor-made force fields are as convenient to use as transferable force fields, the only difference being the initial setup time required for force-field parametrization. Appendix A In the following subsections, the various terms of the deviation function are discussed in detail. All these terms involve additional weights, wj, which are the same for all structures in a similarity set. As the deviation function itself, each of its constituent terms is designed in such a way that setting the weights of all similarity sets to 1 is a reasonable choice, unless there are good reasons to give some data sets a lower rating. A.1. Electrostatic Potential Data. By using a test charge, Q, the differences in the electrostatic potential for the fitted force field and the hybrid method can easily be converted to an average energy error, εi, for a crystal structure, i:

∑mk,iQ2(φff,k,i(q b) - φref,k,i - φoffset,i)2

k εi2(q b) )

∑mk,i

(A1)

k

Here, mk,i is the multiplicity of the point k in the unit cell of structure i. φoffset,i is a constant offset depending on the normalization of the electrostatic potential values calculated with the two methods. Typically, a test charge of Q ) 0.3 e is used. In general, the overall electrostatic data term must take into account contributions from several crystal structures: Nj-1

e2es(q b) )

2 (q b) ∑wj ∑ nasym,i,jεi,j j

i)0 Nj-1

(A2)

∑wj ∑ nasym,i,j j

i)0

The parameter nasym is the number of atoms per asymmetric unit. As in other terms of the deviation function described in the following subsections, the index j runs over all similarity sets, whereas the index i covers all Nj structures in set j. Initially, eqs A1 and A2 were used to evaluate the electrostatic contribution to the deviation function, but this choice turned out to be inconvenient. Indeed, even if all atomic charges are allowed to vary freely, εi is much larger than eref in most cases because point charges do not describe the electrostatic potential very well, and the electrostatic term tends to dominate the deviation function. An alternative indicator, ε˜ i, has therefore been developed:

˜εi2 ) eref

2 εi2 - εmin,i eref + 2εmin,i

(A3)

In the above equation, εmin,i is the value that is obtained for εi if the atomic charges of structure i are fitted to the electrostatic potential without any additional constraints. For εmin,i f 0, that is, if point charges in principle allowed for a perfect fit, the expression returns to the intuitive choice ε˜ i ) εi. If a parameter refinement is more or less converged, εi ≈ εmin,i . eref is typically observed, and eq A3 reduces to

˜εi2 ≈ eref

εi + εmin,i (ε - εmin,i) 2εmin,i i

(A4)

and, ε˜ 2i is essentially proportional to the additional misfit, εi εmin,i. Bond increments fitted to the electrostatic potential alone are not necessarily the best choice for the calculation of accurate lattice energies. For atoms participating in hydrogen bonds, for instance, it is more important that the combination of Coulomb

and vdW interactions yields a balanced description of the energies of the different hydrogen bonds that can be formed. The question then is how far the bond increments are allowed to deviate from their ideal values with respect to the electrostatic potential, and the additional misfit captured by eq A3 provides an appropriate restraint. In eq A2, ε2i has to be replaced by ε˜ 2i . A.2. Minimization Data. Minimization data provide target values for lattice energies and energy gradients at local energy minima. The term em,1 in the deviation function compares lattice energies calculated with the fitted force field and with the hybrid method. 2 em,1 (q b) )

[

]

Nj-1 E Eref,i,j b) ff,i,j(q 1 ∑wj ∑ - eoffset,j ∑wjNj j i)0 ncell,i,j ncell,i,j j

2

(A5) As in the case of electrostatic data, the index j runs over all similarity sets for which minimization data have been generated, whereas the index i covers all Nj structures contained in set j. Eff,i,j and Eref,i,j are the lattice energies for structure i in set j calculated with the force field and the hybrid method, respectively, and ncell,i,j is the number of atoms per unit cell. The energy offset eoffset,j requires some special attention. For model systems consisting of the same molecular components, there is in general an unknown but constant offset between the energies calculated with a force field and those calculated with the hybrid method (or any pure DFT code). This offset can be traced back to the fact that the energy terms of the force field are not calibrated with respect to the energy that is released when atoms come together to form a molecule. For practical applications of force-field calculations, this calibration is not required, because force fields are used to compute energy differences between different molecular arrangements or conformations, where the molecules under consideration remain the same. Hence, the energy offset affects all states in the same way and drops out of the equation. In the parameter refinement presented here, on the contrary, the offset needs to be taken into account explicitly. From a computational point of view, the easiest choice would have been to adjust all offsets eoffset,j independently as to minimize eq A5. However, the use of too many adjustable offsets may result in a loss of essential information. Imagine a case where several crystal structures have been generated for three different cases: a molecule D with an hydrogen-bond donor, a molecule A with an hydrogen-bond acceptor, and a pair of A and D. If the hydrogen bond is formed in all structures generated for the AsD pair and if the offset used for these structures is not related to the two offsets for the structures containing only A or D, the information about the hydrogenbond formation energy is lost. To avoid this kind of problem, the offset for each similarity set is calculated from molecular offsets.

eoffset,j )

∑Mk,jmkemolecule,k k

∑Mk,jmk

(A6)

k

Here, emolecule,k is the energy offset per atom of molecule k, mk stands for the number of atoms in molecule k, and Mk,j is the number of molecules k in the chemical unit that is common to all structures of the similarity set j. Different offsets must be used for flexible and rigid variants of one and the same molecule. Five terms of the deviation function require molecular offsets, namely e2m,1, e2p,1, e2c,1, e2wa,1 and e2rm,1. The molecular offsets are treated as free parameters which are adjusted at every

Tailor-Made Force fields for Crystal-Structure Prediction evaluation of the deviation function such that it adopts the smallest possible value. The term em,2 in the deviation function deals with energy gradients. Each structure in a minimization data set corresponds to a potential-energy minimum according to the hybrid method and an approximate Hessian matrix H is available. In general, the energy minima of the tailor-made force field and the hybrid method do not quite agree. At the minimum according to the hybrid method, the force field typically yields a nonzero gradient, b gff, and lattice energy minimization with the force field would yield an energy decrease of approximately

1 T -1 ∆E ) b gff g H b 2 ff

(A7)

The expected energy change upon minimization is used to convert energy gradients to energy errors per atom. For a fully optimized structure, b gff in eq A7 can be replaced by b gff - b gref because b gref is zero. The new term has the advantage that it is appropriate for use in the deviation function even if the structure optimization with the hybrid method has not fully converged, because it is always zero when the two gradients b gff and b gref are equal and larger than zero otherwise. However, the new term has no well-defined physical meaning if the structure used for the calculation of the two gradients is not fully optimized with respect to the hybrid method. Summing over all structures in all minimization data sets yields

em,2(q b) ) -1 Nj-1 (g bff,i,j(q b) - b gref,i,j)THi,j (g bff,i,j(q b) - b gref,i,j) 1 ∑w ∑ (A8) ∑wjNj j j i)0 ncell,i,j j

Here the meaning of the sums and indices is the same as in eq A2. With the above discussions in mind, it is now possible to explain why some terms in the deviation function, such as e2m,1, are squared, whereas others, such as em,2, are not. By assuming that there is a perfect match with the reference data, e2m,1 and em,2 are both zero. If one force-field parameter, q, is changed by a small amount ∆q, the lattice energies and energy gradients change by (∂Eff,i,j/∂q)∆q and (∂g bjj,i,j/∂q)∆q, respectively. By construction, the increase of both e2m,1 and em,2 is proportional to the second power of ∆q. If em,1 were to be used instead of e2m,1 or e2m,2 instead of em,2, one of the terms would depend more strongly on ∆q than the other. This is undesirable if energies and gradients are to be taken into account in a balanced fashion. It is a general feature of the deviation function that terms dealing with energies are squared, whereas terms related to forces are not. It is worthwhile to clarify the relationship between the terms e2m,1 and em,2 and the energy ranking of molecular crystals. The term e2m,1 captures the difference between lattice energies calculated with the tailor-made force field and the hybrid method for a set of fixed crystal structures which correspond to local minima according to the hybrid method. The term em,2 takes into account the energy shift of these structures upon optimization with the tailor-made force field. The energy difference between the lattice energy minima of the hybrid method and the tailor-made force field is not directly considered in the deviation function. A.3. Packing Data. The first two terms, e2p,1 and ep,2, are calculated according to eqs A5 and A8. For the third term, e2p,3, which takes into account lattice energies calculated at points around local minima, a different approach is taken. The energies at surrounding points characterize the steepness of the potential-

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9827 energy well. In the context of lattice energy ranking, these energies are relevant when it comes to the calculation of the vibrational contribution to the lattice free energy. The differences between the fitted force field and the reference data have therefore been related to some kind of free-energy difference at room temperature. It must be stressed that the currently used crystal-structure prediction procedure does not involve free energies. The inclusion of a free-energy-like term in the deviation function should be seen as a preparation for future enhancements. It is assumed that the position of the local minimum is the same with the fitted force field and the hybrid method, where the two methods yield lattice energies Eff,min and Eref,min, respectively. At a point b V a distance ∆xbV away from the minimum, lattice energies Eff,Vb and Eref,Vb are obtained. The energy differences

∆EY,Vb ) EY,Vb - EY,min

(A9)

can be converted to force constants

KY,Vb ) 2∆EY,Vb ⁄ ∆xbV2

(A10)

and angular frequencies

ωY,Vb ) √KY,Vb ⁄ mbV

(A11)

where Y stands for ff or ref and mbV is the effective mass of the displacement from the minimum to the point b V in a multidimensional coordinate space. The free-energy difference at room temperature of two classical harmonic oscillators with angular frequencies ωff, bV and ωref,Vb is given by

( )

∆GbV ) kTRT ln



ωff,Vb ) kTRT ln ωref,Vb

∆Eff,Vb ∆Eref,Vb

(A12)

The singularity at ∆Eff,Vb ) 0 makes the above expression inconvenient for direct use in the deviation function. Instead, the first term of the Taylor expansion around ∆Eff,Vb ) ∆Eref,Vb is used. Dividing by the number of atoms per unit cell, ncell, and adding an additional factor 1/2 to reflect the fact that the point b V corresponds to only half a vibrational mode (for each mode, there are two points b V corresponding to different displacement directions) gives the free-energy difference per atom

∆gbV ) kTRT

∆Eff,Vb - ∆Eref,Vb 4∆Eref,Vbncell

(A13)

Assuming geometrical error propagation for the free-energy errors ∆gbV associated with a single structure and summing over all similarity sets, j, the corresponding structures, i, and the various points b V around the local energy minimum of structure i, one finally gets

e2p,3(q b) )

Nj-1 1 ∑wj ∑ ∑∆gbV2,i,j(q b) ∑wjNj j i)0 bV

(A14)

j

It is difficult to derive a physically meaningful formula that takes into account forces obtained at points around local minima. Starting from e2p,3 and by analogy with the change that leads from e2p,1 to ep,2, the following expressions can be derived.

1 -1 ˜ (q ∆E (q b) - b gref,Vb,i,j)THi,j (gff,Vb,i,j(q b) - b gref,Vb,i,j) b V,i,j b) ) (gff,V 2 b,i,j (A15) ∆g ˜bV,i,j(q b) )

(

)

˜ (q kTRT 2 ∆E b V,i,j b) 4∆Eref,i ncell,i,j

(A16)

9828 J. Phys. Chem. B, Vol. 112, No. 32, 2008

ep,4(q b) )

Neumann

Nj-1 1 ∑wj ∑ ∑∆g ˜ (q b) ∑wjNj j i)0 V bV,i,j

(A17)

j

A.4. Expansion Data. Expansion data are dealt with in very much the same way as the energies of minimization data.

[

Ni-1Mi,j-1 Eref,l,i,j Eff,l,i,j(q b) 1 1 ∑wj ∑ ∑ e2e,1(q b) ) ∑wjNj j i)0 l)0 Mi,j - 1 ncell,i,j ncell,i,j j

]

expected value. The overall restraint term is a weighted sum over individual restraint terms for atoms with four covalently bonded neighbors, c4,m, atoms with three covalently bonded neighbors in a planar configuration, c3,m, and planar rings, cring,m.

e2constr(q b) )

e2ref R2ref

1 N4 + N3 + Nring

[∑

N4-1



∆g′bV,i,j(q b) )

The relative importance of the overall constraint term is adjusted via the parameter Rref, which is typically set to 0.1°. For three atoms j, k, and l around a central atom i in a planar configuration with FFATs ti, tj, tk, and tl, the corresponding restraint is defined by

c3 ) (360° - θtjtitk,0 - θtjtitl,0 - θtktitl,0)2

(A23)

For a central atom with four covalently bonded neighbors, the restraint term is not that simple. In total, there are six anglebend terms with reference angles θi,0, where i now is an index that runs from 0 to 5 to simplify the notation. For a particular atomic arrangement, the covalent bonds form six angles Ri, where each angle can be calculated from the other five angles.

Ri ) Α(R0, . . . , Ri-1, Ri+1, . . . , R5)

(A24)

The function Α can be derived from geometric considerations. The above relationship is used to construct a constraint term for the reference angles, θi,0. 5

c4 )



1 (θ - Α(θ0,0, . . . , θi-1,0, θi+1, 0, . . . , θ5,0))2 6 i)0 i,0 (A25)

0.5 + exp(-pωref,Vb,i,j ⁄ kTRT) pωref,Vb,i,j ∆Eff,Vb,i,j(q b) - ∆Eref,Vb,i,j 1 - exp(-pωref,Vb,i,j ⁄ kTRT) 4 ∆Eref,Vb,i,jncell,i,j (A20)

For n-membered planar rings, constraint terms of the following type are used:

(

cring ) 180n - 360 -

and eq A16 by

(

]

cring,m(q b) (A22)

m)0

(A19)

Independent offsets are used because only the energy difference between the various states of expansion are important. The densely packed state is in general also part of a minimization data or packing data set. Using an offset linked to other data sets would result in double counting. A.5. Conformation Data. The energy terms for conformation data are constructed in analogy to the terms for packing data. However, because the vibrational energy levels of intramolecular modes are typically of the order of or larger than kTRT, eq A12 has to be replaced by the corresponding expression for the quantum mechanical harmonic oscillator. Following the derivation outlined in subsection A.3 gives expressions identical to eqs A14 and A17 for the terms e2c,3 and ec,4, but eq A13 is replaced by

∆g ˜′bV,i,j(q b) )

∑ c3,m(qb) +

m)0 Nring-1

2

Compared to eq A5, there is an additional sum over the Mi,j frames that take each crystal structure from the densely packed to the fully expanded state. The offset is now adjusted for every structure independently so as to minimize eq 18.

Eref,l,i,j b) 1 Mi,j-1 Eff,l,i,j(q ∑ Mi,j l)0 ncell,i,j ncell,i,j

c4,m(q b) +

m)0

eoffset,i,j (A18)

eoffset,i,j )

N3-1

n-1

∑ θt i)0

)

2

(0+i)modnt(1+i)modnt(2+i)modn,0

(A26)

)

˜ (q 0.5 + exp(-pωref,Vb,i,j ⁄ kTRT) pωref,Vb,i,j 2 ∆E b V,i,j b) (A21) 1 - exp(-pωref,Vb,i,j ⁄ kTRT) 4∆Eref,Vb,i,j ncell,i,j

Here t0-tn-1 are the FFATs of the ring atoms and i mod n is the remainder of the integer division i/n.

A.6. Wide Amplitude Data. Wide amplitude data are integrated in the deviation function in exactly the same way as minimization data (see eqs A5 and A8), where each sequence of frames forms a similarity set. It is important to note that each frame corresponds to a stable structure because of the presence of additional, restraint terms. The restraint terms contribute to all energies and energy derivatives, that is, to the energies and gradients calculated with the hybrid method, to the Hessian matrices, and to the energies and gradients calculated with the fitted force field. 2 A.7. Reverse Minimization Data. The terms erm,1 and erm,2 are defined in analogy to the terms e2m,1 and em,2 in eqs A5 and A8. A.8. Parameter Restraints. For reasons that are outlined in Section 5, a special restraint term is used to ensure that the references angles, θ0, of the angle-bend terms (see eq 15) found around certain central atoms match a geometrically possible atomic configuration. The reference angles of the inner-anglebend terms of planar rings are also restrained to sum up to the

Acknowledgment. The author thanks Sanofi-Aventis and Astra Zeneca for the generous funding received for this project. The author is particularly indebted to Marc-Antoine Perrin from Sanofi-Aventis, who, over a period of 5 years, created and secured the stimulating and liberal working environment which was required to tackle the challenging task of crystal-structure prediction. Supporting Information Available: Tailor-made force field for cyclohexane-1,4-dione. This material is available free of charge via the Internet at http://pubs.acs.org. References and Notes (1) Lommerse, J. P. M.; Motherwell, W. D. S.; Ammon, H. L.; Dunitz, J. D.; Gavezzotti, A.; Hofmann, D. W. M.; Leusen, F. J. J.; Mooij, W. T. M.; Price, S. L.; Schweizer, B.; Schmidt, M. U.; van Eijck, B. P.; Verwer, P.; Williams, D. E. Acta Crystallogr. B 2000, 56, 697. (2) Motherwell, W. D. S.; Ammon, H. L.; Dunitz, J. D.; Dzyabchenko, A.; Erk, P.; Gavezzotti, A.; Hofmann, D. W. M.; Leusen, F. J. J.; Lommerse, J. P. M.; Mooij, W. T. M.; Price, S. L.; Scheraga, H.; Schweizer, B.;

Tailor-Made Force fields for Crystal-Structure Prediction Schmidt, M. U.; van Eijck, B. P.; Verwer, P.; Williams, D. E. Acta Crystallogr. B 2002, 58, 647. (3) Day, G. M.; Motherwell, W. D. S.; Ammon, H. L.; Boerrigter, S. X. M.; Della Valle, R. G.; Venuti, E.; Dzyabchenko, A.; Dunitz, J. D.; Schweizer, B.; van Eijck, B. P.; Erk, P.; Facelli, J. C.; Bazterra, V. E.; Ferraro, M. B.; Hofmann, D. W. M.; Leusen, F. J. J.; Liang, C.; Pantelides, C. C.; Karamertzanis, P. G.; Price, S. L.; Lewis, T. C.; Nowell, H.; Torrisi, H.; Scheraga, H. A.; Arnautova, Y. A.; Schmidt, M. U.; Verwer, P. Acta Crystallogr. B 2005, 61, 511–527. (4) Neumann, M. A.; Perrin, M.-A. J. Phys. Chem. B 2005, 109, 15531– 15541. (5) Kresse, G.; Hafner, J. Phys. ReV. B 1993, 47, 558. (6) Kresse, G.; Hafner, J. Phys. ReV. B 1994, 49, 14251. (7) Kresse, G.; Furthmu¨ller, J. Comput. Mat. Sci. 1996, 6, 15. (8) Kresse, G.; Furthmu¨ller, J. Phys. ReV. B 1996, 54, 11169. (9) Kresse, G.; Joubert, D. Phys. ReV. B 1999, 59, 1758. (10) The Polymorph Predictor and Cerius2 are products of Accelrys Inc., 9685 Scranton Road, San Diego, CA 92121-3752, USA. (11) Neumann, M. A.;unpublished results. (12) Oganov, A. R.; Glass, C. W. J. Chem. Phys. 2006, 124, 244704– 244714. (13) Mayo, S. L.; Olafson, B. D.; Goddard, W. A., III J. Phys. Chem. 1990, 94, 8897. (14) MacKerrel, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, J.; Ha, S.; JosephMcCarthy, D.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, R.; Nguyen, D. T.; Pordhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, J.; Karplus, M. J. Phys. Chem. B 1998, 102, 3586–3616.

J. Phys. Chem. B, Vol. 112, No. 32, 2008 9829 (15) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 5179–5197. (16) Halgren, T. A. J. Am. Chem. Soc. 1992, 114, 7827–7843. (17) Hagler, A. T.; Huler, E.; Lifson, S. J. Am. Chem. Soc. 1974, 96, 5319. (18) Sun, H. J. Phys. Chem. B 1998, 102, 7338–7364. (19) Rappe´, A. K.; Casewit, C. J.; Colwell, K. S.; Goddard, W. A.; Skiff, W. M. J. Am. Chem. Soc. 1992, 114, 10024–10035. (20) Mooij, W. T. M.; Leusen, F. J. J. Phys. Chem. Chem. Phys. 2001, 3, 5063–5066. (21) Day, G. M.; Motherwell, W. D.; Jones, W. Cryst. Growth Des. 2005, 5, 1023–1033. (22) Karamertzanis, P. G.; Price, S. L. J. Chem. Theory. Comput. 2006, 2, 1184–1199. (23) Brodersen, S.; Wilke, S.; Leusen, F. J. J.; Engel, G. Phys. Chem. Chem. Phys. 2003, 5, 4923–4931. (24) Brodersen, S.; Wilke, S.; Leusen, F. J. J.; Engel, G. Cryst. Growth Des. 2005, 5, 925–933. (25) Neumann, M. A.; Leusen, F. J. J.; Kendrick, J. Angew. Chem., Int. Ed. 2008, 47, 2427–2430. (26) GRACE; Avant-garde Materials Simulation: St-Germain-en-Laye, France (http://www.avmatsim.eu). (27) Halgren, T. A. J. Am. Chem. Soc. 1992, 114, 7827. (28) Wu, Q.; Yang, W. J. Chem. Phys. 2002, 116, 515. (29) McQuarrie, D. A. Statistical Mechanics; University Science Books: Sausalito, CA, 2000; p, 97. (30) Papadopoulos, P. M.; Papadopoulos, C. A.; Katz, M. J.; Link, W. J.; Bruno, G. Int. J. High Perf. Comp. Appl. 2004, 18, 317–326. (31) Mossel, A.; Romers, C. Acta Crystallogr. 1964, 17, 1217.

JP710575H