Enhancing Crystal-Structure Prediction with NMR Tensor Data James K. Harper and David M. Grant* Department of Chemistry, UniVersity of Utah, 315 South 1400 East Salt Lake City, Utah 84112 ReceiVed April 24, 2006; ReVised Manuscript ReceiVed July 21, 2006
CRYSTAL GROWTH & DESIGN 2006 VOL. 6, NO. 10 2315-2321
ABSTRACT: A method for selecting high-probability structures from numerous computer-generated crystal structures is described. This procedure evaluates structures by comparing computed NMR shifts for each predicted structure to experimental solid-state NMR data. Four carbohydrates are evaluated: methyl β-D-galactopyranoside, methyl R-D-glucopyranoside, methyl R-Dmannopyranoside, and methyl β-D-xylopyranoside. In these cases, 81.8% of the structures retained as probable fits by lattice energy comparisons are eliminated by the NMR criterion using tensor principal values. This analysis also ranks the correct structure as the best-fit for methyl R-D-glucopyranoside and usually places the correct structure among the top five in other cases. Isotropic shift comparisons are less successful in selecting structure. The NMR analysis is sufficiently sensitive to identify a 30° error in one torsion angle of the purported correct structure of methyl β-D-galactopyranoside. In this case, it is found that none of the 164 computer-generated structures match experimental data. The substances investigated experience only weak electrostatic fields; therefore, the NMR analysis chooses primarily by molecular conformation rather than lattice structure. NMR data thus provide a valuable independent selection criterion. The presence of strong electrostatic fields in polar samples can alter the results given here, and likely changes in the selection process are discussed. Introduction The determination of solid-state structure solely from the knowledge of atomic connectivity is the goal of ab initio crystalstructure prediction (CSP). Success in such an endeavor would yield valuable insight into solids that fail to form single crystals suitable for diffraction. However, accurate prediction of crystal structure has proven to be a formidable challenge, despite a detailed understanding of the physical parameters involved.1 The shortcomings of early efforts led one writer to call them “one of the continuing scandals of the physical sciences”.2 Fortunately, recent developments in the predictive process have resulted in more sophisticated and computationally efficient techniques that remedy some of these difficulties.3 These theoretical methods typically rely on minimizing the lattice energy in a large set of computer-generated structures and selecting the lowest-energy forms as possible polymorphs. This process is presently most successful with rigid structures containing one molecule per asymmetric unit.3 Usually, structures are generated in only the most commonly observed space groups, though techniques capable of including all 230 space groups have been reported.4 Present methods are capable of generating thousands of crystal structures but continue to achieve only modest success in identifying a single candidate that matches an experimental structure. One reason is the inaccuracy of present-day force fields, but closer study of thermal and kinetic effects is also needed. Despite these limitations, the correct crystal structure is usually found among the predicted structures generated. Perhaps most significantly, these methods have illustrated that typical molecules have access to a remarkable variety of crystal structures whose energies differ by only a few kilojoules per mole from the global minimum. Ranking these structures solely by computed energy, therefore, creates a difficulty because relatively small errors in the computed energy can alter the order. Thus, it is desirable to explore alternative methods of comparison. An alternative criterion for evaluating predicted structures has previously been investigated using X-ray powder diffraction * To whom correspondence should be addressed. E-mail: grant@ chemistry.utah.edu. Phone: 801-581-8854. Fax: 801-581-8433.
data. In a blind test conducted at the Cambridge Crystallographic Data Centre in 2002, an evaluation was performed using nonindexed powder diffraction data generated from actual singlecrystal coordinates.3a Participants ranked their predicted structures by a profile comparison with the experimental data to select a correct structure. However, this approach proved of limited value in identifying the structure corresponding to an experimentally observed polymorph. Enthalpy has also been proposed as a selection criterion and has been shown to provide unique selectivity in benzene.5 However, its value in more general structural analysis has yet to be demonstrated. Another alternative involves solid-state NMR (SSNMR) data. Such data have already been demonstrated to improve the determination of structure in powders when combined with X-ray powder diffraction data.6 It is anticipated that SSNMR data should likewise complement energy comparisons because these data are quite sensitive to short-range structure within roughly 1-4 Å of each magnetically active nucleus. Moreover, NMR shifts can be easily computed for all CSP structures just as energy presently is. A comparison can then be made of the shifts calculated for each computer-generated structure with those measured experimentally to find a best match. Strictly speaking, this approach deviates from ab initio crystal-structure prediction because experimental data are included. However, these data are not used in any way to constrain or guide the prediction process, only to assess the quality of CSP structures for a particular solid. In reality, most present predictions perform comparable verifications of accuracy against some type of experimental data, such as single-crystal diffraction information. In practice, inclusion of experimental data in this manner may be necessary because of the tendency of the prediction processes to predict forms that are not realizable under ordinary conditions. For example, in certain solids, it has been demonstrated that some low-energy CSP structures are obtainable only at high pressures.5,7 In this work, evaluations are performed using SSNMR chemical shift data to select among CSP structures of four small carbohydrates.8 The SSNMR data are taken from previous work9 and, for each molecular position, consist of both a single isotropic shift and the three principal shift values of the chemical shift tensor.
10.1021/cg060244g CCC: $33.50 © 2006 American Chemical Society Published on Web 09/06/2006
2316 Crystal Growth & Design, Vol. 6, No. 10, 2006
Harper and Grant
Experimental Section All experimental NMR data for the four carbohydrates studied herein were obtained from a prior study and used without modification.9 Computer-generated crystal structures were likewise obtained from previous work8 and the structure generation process is summarized here for convenience. Initial structure generation considered the carbohydrate as a rigid molecule with all OH pairs modeled as single atoms. This molecule was taken as the asymmetric unit and placed in a variety of positions and orientations in a unit cell of varying dimensions. Only space groups with triclinic, monoclinic, or orthorhombic symmetry were considered. Structures generated at this stage were evaluated using only the simple force field, GROMOS87,10 in order to allow millions of initial structures to be considered. This process was repeated for all probable conformations to account for molecular flexibility. The majority of these initial structures were eliminated by a criterion that rejected any structure in which any atomic repulsion energy term exceeded 10 000 kJ/mol. An energy minimization was then performed on the remaining structures, and those dropping below a certain threshold were retained. A second evaluation step treated the hydroxyl O and H as separate atoms and placed these H’s in the three staggered positions about the C-O bond axis. Energy minimizations using UNITAT11 were then performed involving only the OH dihedral angles and the crystallographic parameters. A full structure relaxation was subsequently performed on the structures obtained. A third and final stage involved an energy minimization with the more sophisticated OPLS force field.12 A clustering procedure was then used to eliminate repeats of a given structure. The final structures were checked for stability by relaxing the space group symmetry to P1 then performing a molecular dynamics shake-up to allow unstable structures to adjust to new forms. Chemical shielding tensors were computed for each CSP structure using Gaussian 0313 at the B3PW91/D95** level of theory14 with parallel processing. All computations were performed using only the single molecule constituting the asymmetric unit. The optimal conversion of computed shieldings to shifts was determined by making a plot of shielding versus shift for the principal values of the four neutron diffraction structures according to the icosahedral method of Alderman et al.15 This plot was linear (r2 ) 0.986) and gave a conversion of shift ) (shielding - 193.68)/(-1.091) with an rms error of 1.79 ppm in computed principal values. When each predicted structure is compared to experimental data, the rms error was computed as the icosahedral distance (d) according to a previously described procedure.15 F-values for each predicted structure were obtained as F ) (di2/dj2), where di represents the rms icosahedral distance of the ith CSP structure from the experimental data and dj denotes the corresponding distance for the best-fit predicted structure.
Results and Discussion Using Isotropic and Shift Tensor NMR Data to Rank Predicted Crystal Structures. Crystal structure is encoded in SSNMR chemical shifts under certain circumstances. Presently, this is known to occur in two cases. Case I occurs when longrange electrostatic fields are not found in the lattice. This case typically occurs when no formally charged atoms are present. Here, the molecules adopt conformations that minimize the sum of intermolecular and intramolecular interactions. Such conformational adjustments are strongly reflected in chemical shifts,16 providing insight regarding a molecule’s nearby surroundings and thereby crystal structure. Case II arises when a molecule contains formally charged atoms or certain polar moieties such as CdO.17 These groups create electrostatic fields that can extend over a distance that is sufficiently long to influence the SSNMR shifts of neighboring molecules and thus to reflect the crystallographic arrangement.18 Case II may include the conformational variations and accompanying shift changes found in case I as well as variations due to field effects. Three levels of shift data are available from SSNMR: a single isotropic shift per nucleus, the three tensor principal values per nucleus, and the six values specifying the full 3 × 3 tensor.19
Figure 1. Structures of the four methyl glycosides analyzed, including the numbering used. Structures shown are methyl β-D-galactopyranoside (top left), methyl R-D-glucopyranoside (top right), methyl R-Dmannopyranoside (bottom left), and methyl β-D-xylopyranoside.
Isotropic and principal value data are obtainable from microcrystalline powders, whereas relatively large single crystals are usually needed to obtain the full tensor. Because powders are more abundant than large single crystals, the isotropic and principal value data will be the focus of this study. Indeed, if a large single crystal is available, there is no need for CSP. The four carbohydrates selected for analysis were chosen because both CSP8 and high quality SSNMR data9 are available. Additionally, in all four cases, the actual crystal structure examined by SSNMR was thought to be included among the computer-generated structures. The molecules evaluated were methyl R-D-glucopyranoside (ref. code MGLUCP11), methyl R-D-mannopyranoside (MEMANP11), methyl β-D-galactopyranoside (MBDGAL02), and methyl β-D-xylopyranoside (XYLOBM01), as shown in Figure 1. The CSP structures included in this analysis were those within 20-25 kJ/mol of the lowestenergy structure.8 These energies were calculated using the OPLS force field, which has been shown to have a root-meansquare deviation between the OPLS energies and ab initio crystal energies of about 10 kJ/mol in comparable carbohydrates.20 It is therefore unlikely that the computed lattice energies are sufficiently accurate for the purpose of polymorph selection. NMR shifts were computed for all CSP structures as described in the Experimental Section. For the molecules studied here, inclusion of neighboring molecules in the NMR calculation has been shown to be unnecessary,21 providing the O-H groups are retained in the conformation dictated by the lattice. Thus, these carbohydrates reflect their environment according to case I described above and inclusion of long-range lattice detail in computations does not significantly change the calculated chemical shieldings. Hence, NMR computations were performed using the single molecule that constitutes the asymmetric unit of the predicted structure. Accurate selection of high-probability structures is enhanced by a prudent choice of criteria used to determine the quality of fit. In this study a simple F-test was performed using an approach described previously.16g This method has the advantages of being relatively insensitive to systematic errors in the NMR shift computations,22 and statistical probabilities may be assigned to any predicted structure. Application of this selection criterion is illustrated below. Methyl β-D-Xylopyranoside. In methyl β-D-xylopyranoside, crystal-structure prediction produced 39 structures within 22 kJ/ mol of the global minimum in lattice energy of 52.1 kJ/mol. Of these, F-test comparisons for isotropic shifts alone reduced this number to 12 probable structures that were statistically indistinguishable from the best fit structure. The fit to only principal values was slightly more sensitive and retained 10 structures.23 Both comparisons were performed at the 90% probability level
Crystal-Structure Prediction with NMR Tensor Data
Crystal Growth & Design, Vol. 6, No. 10, 2006 2317
Table 1. Rankings of Predicted Crystal Structures by NMR and Lattice Energy no. of structures retained structure ref. code XYLOBM01 MEMANP11 MGLUCP11 MBDGAL02a
ranking of correct structure
lattice energy
isotropic NMR fit
principal value NMR fit
lattice energy
isotropic NMR fit
principal value NMR fit
39 165 54 164
12 15 31 46
10 7 30 7
5 1 17 1
10 8 17 55a
5 3 1 24a
a The 164 predicted structures do not include a structure matching both the neutron diffraction unit-cell dimensions and all molecular conformations. Therefore, rankings given here are for the predicted structure most closely matching the experimental structure (i.e., CSP structure 31273).8 The NMR rankings of structure 31273 improve to 2 and 5 for isotropic and principal value fits, respectively, if the C3 O-H hydrogen is placed in the neutrondiffraction-determined position and the remaining structure is then refined as described in the text.
Figure 2. Comparison of the lattice energy vs NMR fit for the predicted crystal structures of methyl β-D-xylopyranoside. All 39 structures retained by lattice energy considerations are shown on the right, and those remaining after NMR analysis (90% probability level) are shown at the left. Comparisons made on the basis of principal shift values are shown in the top two plots, with isotropic shift fits shown in the bottom two. The probability levels are labeled in the right-hand plots to illustrate the selectivity differences between analyses on the basis of tensor principal values and isotropic shifts.
using error values of 1.33 and 2.12 ppm for the isotropic and principal values, respectively, for the best-fit methyl β-Dxylopyranoside structure. In both cases, the predicted structure matching experimental data was included among the retained structures. None of the methods, however, ranked this structure as the best-fit, with isotropic shifts, principal values, and energy comparisons providing rankings of 10, 5, and 5, respectively (Table 1). Plots of the NMR fits and the corresponding lattice energies for all structures are given in Figure 2. Similar evaluations for the three remaining glycosides are given below. Methyl R-D-Mannopyranoside. Crystal-structure prediction for methyl R-D-mannopyranoside produced 165 structures within 20 kJ/mol of the global minimum in an lattice energy of 80.2 kJ/mol. Of these, comparison of only isotropic shifts eliminated 150 structures as statistically improbable. The fit to principal value data was more sensitive and rejected 158 structures. Both comparisons, shown in Figure 3, were performed at the 90% probability level using error values of 0.93 and 1.61 ppm for the isotropic and principal values, respectively, for the best-fit methyl R-D-mannopyranoside structure. In both cases, the CSP structure matching experimental data was retained among the remaining structures. Neither of the methods, however, ranked this structure as the best-fit, with isotropic and principal values
Figure 3. Comparison of the lattice energy and NMR fit for methyl R-D-mannopyranoside showing only those predicted structures retained by the NMR comparison (90% probability). The seven structures fitting principal shift values are shown in the top plot, with the 15 fitting isotropic shifts shown at the bottom.
providing rankings of 8 and 3, respectively (Table 1). By comparison, this structure ranked first in the lattice energy calculation. Methyl r-D-Glucopyranoside. Crystal-structure prediction for methyl R-D-glucopyranoside produced 54 structures within 20 kJ/mol of the lattice energy global minimum of 71.5 kJ/mol. Of these, comparison of isotropic shifts alone eliminated 23 structures, whereas the fit to principal value data removed 24 structures. Both analyses were performed at the 90% probability level using error values of 1.99 and 2.89 for the best-fit isotropic and principal value fits, respectively. In both the isotropic and principal value comparisons, the predicted structure matching the experimental data was retained. This structure was ranked 1, 17, and 17 in comparisons to principal values, isotropic shifts, and energy, respectively. Plots of the NMR fits and the corresponding lattice energies for all retained structures are given in Figure 4. Methyl β-D-Galactopyranoside. Crystal-structure prediction produced 164 structures within 25 kJ/mol of the global minimum of 84.9 kJ/mol. Of these, comparison of isotropic shifts alone eliminated 118 at the 90% probability level. The fit to only principal value data eliminated 157 structures at the same probability. Unfortunately, in both cases, the CSP structure alleged to match the experimental structure was eliminated as a probable lattice. Identifying the origin of this omission is crucial if SSNMR data are to be used reliably in crystal-structure prediction. A discussion of the source of this problem is given
2318 Crystal Growth & Design, Vol. 6, No. 10, 2006
Figure 4. Comparison of the lattice energy and NMR fit for methyl R-D-glucopyranoside showing only the predicted structures retained by the NMR analysis (90% probability level). The 30 structures fitting principal shift values are shown in the top plot, with the 31 fitting isotropic shifts shown at the bottom.
below. Table 1 provides a summary of all comparisons performed for the four glycosides examined here. Reconciling Differences in NMR and Energy Predictions for Methyl β-D-Galactopyranoside. The failure of the SSNMR structure-selection criterion to retain the predicted structure of methyl β-D-galactopyranoside thought to best fit the diffraction data (structure 31273) requires more careful evaluation. It is possible to differentiate errors in the NMR selection methodology from errors in predicted structure by including the actual neutron diffraction structure among the CSP structures and computing its NMR shifts for comparison with experimental data. When this was done, the neutron diffraction structure was retained as a highly probable fit by the principal values (σ ) 1.98 ppm versus 1.96 ppm for the best-fit CSP structure), verifying that a structure having the correct geometry will be retained by the SSNMR analysis. Because the lattice parameters of structure 31273 closely match experimental data, this result implies that the correct combination of lattice parameters and molecular geometry was not to be found among the 164 generated structures. However, comparison of the experimental bond lengths and valence angles with CSP structure 31273 revealed that these parameters were quite accurate with respective errors of 0.011-0.015 Å and 1.52-2.06°. Moreover, the best-fit CSP structure, by the NMR criterion, had nearly identical errors, yet was retained as a probable structure. Thus, predicted bond lengths and valence angles were not the source of the error. Dihedral angles involving non-hydrogen atoms were also accurate, with typical errors of