Enhanced Efficiency of Direct-Space Structure Solution from Powder X

Publication Date (Web): May 22, 2007 ... the feasibility of molecular conformations within direct-space structure-solution calculations of organic mol...
0 downloads 0 Views 376KB Size
J. Phys. Chem. B 2007, 111, 6349-6356

6349

Enhanced Efficiency of Direct-Space Structure Solution from Powder X-ray Diffraction Data in the Case of Conformationally Flexible Molecules Andrew J. Hanson, Eugene Y. Cheung, and Kenneth D. M. Harris* School of Chemistry, Cardiff UniVersity, Park Place, Cardiff, CF10 3AT Wales, U.K. ReceiVed: February 3, 2007

A strategy is reported for assessing the feasibility of molecular conformations within direct-space structuresolution calculations of organic molecular crystal structures from powder X-ray diffraction data, focusing in particular on the genetic algorithm technique for structure solution in which fitness is defined as a function of the whole-profile figure-of-merit Rwp. The strategy employs a readily computed distance-based function to assess the feasibility of the molecular conformation in each trial structure generated in the genetic algorithm calculation, and structures considered to have low-feasibility conformations are penalized within the evolutionary process. The strategy is shown to lead to significant improvements in the success rate of structure-solution calculations in the case of flexible molecules with a significant number of conformational degrees of freedom.

Introduction Global optimization techniques are applied across a wide range of areas of the physical sciences,1 including applications in studies of protein folding2a and protein-substrate interactions,2b structure prediction of clusters2c-e and crystals2f based on location of global energy minima, structure determination from diffraction data (including electron diffraction2g and powder X-ray diffraction3,4), interpretation of spectroscopic (e.g., NMR) data,2h optimization of laser pulse shapes for quantum control of chemical reactions,2i,j and derivation of optimized force fields.2k Several different global optimization algorithms5 have been employed in these fields of investigation, including Monte Carlo and simulated annealing methods, neural networks, genetic algorithms, and other evolutionary procedures. Although details of the computational implementations of global optimization strategies necessarily differ among these different areas of application, there are nevertheless many features in common between the approaches employed in different fields, and fundamental advances established within one field of application are often directly transferable to other fields. Among the areas in which global optimization techniques are currently employed, structure determination of organic molecular solids from powder X-ray diffraction data is currently carried out widelysindeed, the recent upsurge of activity in this field3 coincided directly with the development of the “direct-space strategy” for structure solution4 in which the structure-solution process is considered as a problem of global optimization. Such techniques for carrying out structure determination directly from powder X-ray diffraction data provide a critical opportunity for structural characterization of the wide range of materials that cannot be prepared as single crystals of suitable size and/or quality for carrying out single-crystal X-ray diffraction experiments. There are two basic strategies for carrying out structure solution from powder X-ray diffraction data: the traditional and direct-space approaches. The traditional approach follows a close analogy to the analysis of single-crystal X-ray diffraction * To whom correspondence should be addressed. E-mail: HarrisKDM@ cardiff.ac.uk.

data and involves first the extraction of the intensities, I(hkl), of individual reflections from the powder X-ray diffraction pattern, which are then used in the types of structure-solution calculation (e.g., direct methods or Patterson methods) that are used for single-crystal X-ray diffraction data. The direct-space strategy, on the other hand, exploits the fact that the powder X-ray diffraction pattern can be calculated automatically for any proposed structure. (Note that the converse is not true, as a consequence of the “phase problem”.) Thus, trial structures are generated independently of the experimental powder X-ray diffraction pattern, and the quality of each trial structure is assessed by direct comparison between the powder diffraction pattern calculated for the trial structure and the experimental powder diffraction pattern. This comparison is quantified using an appropriate figure-of-merit. In our work in this field, the figure-of-merit employed is the weighted powder profile R factor Rwp,6 which implicitly takes peak overlap into consideration. The direct-space strategy handles structure solution as a global optimization problem, with the aim of finding the trial structure that corresponds to optimal agreement (lowest Rwp) between calculated and experimental powder diffraction patterns and is equivalent to exploring the hypersurface Rwp(Γ) to find the global minimum, where Γ represents the set of variables that define the structure. A number of different global optimization techniques have been used in this field,3,4 and our current implementation of the direct-space approach for structure solution is based on a genetic algorithm (GA) search technique.7 The direct-space strategy has been found to be particularly appropriate in the case of structure determination of organic molecular crystal structures. Such materials typically have large unit cells and low symmetry, which lead to a high density of peaks (and hence extensive peak overlap) in the powder X-ray diffraction pattern. In defining the structural variables within the set Γ in a directspace structure-solution calculation, it is common practice to keep the total number of variables as low as possible in order to reduce the dimensionality of the search space. However, it is advantageous to make maximum use of reliably known information on molecular geometry, for example, by fixing bond lengths and bond angles at standard values during the structure-

10.1021/jp070954c CCC: $37.00 © 2007 American Chemical Society Published on Web 05/22/2007

6350 J. Phys. Chem. B, Vol. 111, No. 23, 2007

Hanson et al.

TABLE 1: Percentage of Trial Structures for Which ∆N Is Less Than a Certain Specified Value (Denoted ∆c) in the Distributions of ∆N Values Shown in Figure 2 ∆c

molecule A

molecule B

molecule C

molecule D

0.02 0.05 0.10 0.25 0.40 0.50 0.80 1.00

3.8 52.8 72.2 87.6 94.2 96.4 99.8 100.0

5.2 54.4 70.9 84.8 92.9 96.0 99.8 100.0

18.5 53.9 63.7 77.2 89.7 94.9 99.8 100.0

22.5 55.1 64.9 79.8 92.2 96.4 99.9 100.0

solution calculation and by fixing well-defined structural units, such as aromatic rings, at standard geometries. Under these circumstances, the structural variables in the set Γ comprise, for each molecule in the asymmetric unit, the position {x, y, z} and orientation {θ, φ, ψ} of the molecule and a set of n variable torsion angles {τ1, τ2, ..., τn} to define the molecular conformation. For organic molecular solids, most reported crystal structure determinations from powder X-ray diffraction data have employed the direct-space strategy, although a number of successful applications of the traditional approach for structure solution have also been reported. For structure solution by the traditional approach, the complexity of the problem generally depends on the number of atoms in the asymmetric unit, whereas for structure solution by the direct-space approach, the complexity of the problem is dictated by the dimensionality of the hypersurface to be explored (i.e., the total number of structural variables in the set Γ). The greatest challenges in the application of direct-space techniques are thus encountered when the number of structural variables is large and arise when the molecule has considerable conformational flexibility (i.e., when a large number of variable torsion angles are required to define the molecular conformation) and/or when there are several independent molecules in the asymmetric unit. In the present article, we propose and demonstrate a strategy for assessing the feasibility of molecular conformations within direct-space structure-solution calculations for organic molecular crystal structures from powder X-ray diffraction data, focusing in particular on the genetic algorithm technique for structure solution in which fitness is defined as a function of the wholeprofile figure-of-merit Rwp. The strategy employs a readily computed distance-based function to assess the feasibility of the molecular conformation in each trial structure generated in the genetic algorithm calculation, and trial structures considered to have conformations of low feasibility are penalized within the evolutionary process. Summary of the Genetic Algorithm Technique for Structure Solution The aim of the genetic algorithm (GA) technique is to carry out global optimization by mimicking the processes of evolution that occur in biological systems. Thus, the technique investigates the evolution of a population through successive generations and involves familiar evolutionary operations such as mating, mutation, and natural selection. Our GA technique7 for structure solution from powder X-ray diffraction data, implemented in the program EAGER,8 investigates the evolution of a population of trial crystal structures, with each member of the population defined by a set of variables Γ, as discussed above. Each member of the population is uniquely characterized by the values of these variables, and the set Γ defines its “genetic code”. The initial population Po comprises Npop randomly generated struc-

TABLE 2: Percentage Success Rate at the End of the GA Calculations (Corresponding to Ngenmax Generations) for Each of the Test Molecules A-Da ∆feas Ngenmax 0.02 0.05 0.10 0.25 0.40 0.50 0.80 1.00 molecule A molecule B molecule C molecule D

20 40 60 60

2 4 82 16

76 42 84 38

76 46 74 34

58 52 98 44

52 48 82 38

60 52 74 24

52 44 86 24

40 38 72 20

a In each case, the percentage success rate was determined from 50 independent GA calculations (and represents the final point on each of the lines shown in Figure 3).

tures. The population is then allowed to evolve through subsequent generations by applying the evolutionary operations of mating, mutation, and natural selection. Through these operations, a given generation (population Pj) is converted to the next generation (population Pj+1). The number Npop of structures in the population is constant for all generations, and Nmat mating operations and Nmut mutation operations are performed during the evolution from population Pj to population Pj+1. The quality (“fitness”) of each structure in the population depends on its value of Rwp (a lower Rwp value represents higher fitness), and it is advantageous to define fitness as an appropriate decreasing function of Rwp. In the mating procedure, a given number (Nmat) of pairs of structures (“parents”) are selected from the population (the probability of selecting a given structure as a parent is proportional to its fitness), and for each pair of parents, two new structures (“offspring”) are generated by distributing parts of the genetic codes of the two parents among the two offspring. In the mutation procedure, a given number (Nmut) of structures are selected at random from the population, and random changes are made to parts of their genetic code to create mutant structures. (The original structures from which the mutants are derived are still retained within the population.) In the natural selection procedure, only the structures of highest fitness (lowest Rwp) are allowed to pass from one generation to the next generation in the GA calculation. After the population has evolved for a sufficiently large number of generations, the structure with the lowest value of Rwp should be close to the correct structure. Methodology for Assessing Feasibility of the Molecular Conformation Our aim here is to develop a straightforward and computationally efficient means of assessing the feasibility of molecular conformations in trial structures during direct-space structure solution from powder X-ray diffraction data and to find a procedure for utilizing this information in a way that enhances the success of the calculations. Clearly, one approach would be to use an accurate potential energy parametrization and to base the assessment of conformational feasibility on the intramolecular potential energy computed using such a parametrization. Although we initially considered adopting this approach, it suffers from a number of disadvantages. First, it is common, in direct-space structure-solution calculations of organic molecular crystals using powder X-ray diffraction data, to omit hydrogen atoms, given the low X-ray scattering power of hydrogen relative to other elements present in such structures. (Thus, inclusion of hydrogen atoms in the structural model would significantly increase the time required to compute Rwp, while not providing any significant improvement in the accuracy of the Rwp calculation.) Under such circumstances, the

Enhanced Efficiency of Direct-Space Structure Solution

J. Phys. Chem. B, Vol. 111, No. 23, 2007 6351

use of an accurate potential energy parametrization (developed and optimized on the basis of complete molecules including hydrogen atoms) would lead to energies that were not a reflection of the true intramolecular potential energy (although they might nevertheless provide a reasonable assessment of the relative ranking of structures). Thus, in the present context, accuracy of the intramolecular potential energy is not a relevant issue. Second, the intramolecular potential energy can approach infinity for conformations in which a particular interatomic distance takes a value close to zero. In terms of ranking the feasibility of molecular conformations, it is desirable to have a quantity that has finite bounds, allowing straightforward normalization, as discussed below. Third, the use of an accurate expression for intramolecular potential energy might give rise to a significant increase in computation time, while providing no significant advantage in terms of ranking structures on the basis of conformational feasibility, in comparison to the more approximate approach described below. For these reasons, the strategy reported here does not use intramolecular potential energy as the quantity to assess feasibility of molecular conformation, but rather uses a quantity that has the following essential characteristics: (i) it provides a reliable ranking of structures according to whether there are unfavorable features in the molecular conformation (and is applicable for the case in which hydrogen atoms are omitted from the molecule), (ii) it has finite bounds, and (iii) it is quick to calculate. The procedure that we have developed is now described. Each time a new structure is created during the GA calculation (e.g., as an offspring in the mating operation), a “feasibility score” for the molecular conformation is calculated, the value of which depends on the extent to which the molecular conformation has unfavorable intramolecular interactions. The feasibility score contains a contribution (denoted ∆ij) from the interaction between each pair of atoms (labeled i and j) in the molecule, with ∆ij given by

∆ij ) DIJ - dij ∆ij ) 0

if dij < DIJ if dij g DIJ

(1)

where dij is the distance between atoms i and j and DIJ is a parameter that depends on the types of atoms i and j (subscripts I and J are used to denote parameters that depend on the atom types of atoms i and j). DIJ is defined as

DIJ ) RI + RJ + mIJ

(2)

where RI and RJ represent van der Waals radii of atoms of types I and J, respectively,9 and mIJ is a parameter that might depend on the atom types I and J.9 In practice, the use of a small positive value for mIJ (typically ca. 0.55 Å) allows ∆ij to provide discrimination among interatomic distances that are slightly larger than the sum of van der Waals radii. To obtain the total feasibility score (denoted ∆tot) for the molecular conformation, a summation is carried out over all pairs of atoms in the molecule ntot-1 ntot

∆tot )

∑ ∑ ∆ij2 i)1 j)i+1

(3)

where ntot is the total number of atoms in the molecule.10 The value of ∆tot does not itself have any direct physical meaning, but it does provide an easily computed value for ranking molecular conformations according to the extent to

which they contain unfavorable intramolecular interactions (i.e., with dij significantly less than DIJ) such that, as ∆tot increases, the molecular conformation is considered to be less feasible. As the GA calculation progresses, the value of ∆tot is normalized by considering the values of ∆tot for all trial structures generated since the start of the calculation. Using ∆max and ∆min to denote the highest and lowest values, respectively, of ∆tot among all the trial structures sampled to a given point in the GA calculation (i.e., not just the trial structures in the current generation), the value of ∆tot for each structure is normalized as follows

∆N )

∆tot - ∆min ∆max - ∆min

(4)

with 0 e ∆N e 1. Clearly, the values of ∆max and ∆min are updated in each generation of the GA calculation. The normalized value (∆N) provides an assessment of the conformational feasibility of each structure relative to all other structures generated to that point in the GA calculation, without requiring prior knowledge of the expected range of values of ∆tot. For each new trial structure generated in the GA calculation, the value of ∆N is calculated, and trial structures with ∆N greater than a predefined value (∆feas) are penalized by having their fitness (F) decreased. In the present work, the decreased value of fitness is given by R-1F, where R is a predefined constant (typically R ) 2). Clearly, a versatile range of procedures for carrying out the reduction of fitness could be envisaged in future developments of the technique.11 Reduction of the fitness of trial structures with low-feasibility conformations effectively decreases the likelihood that such trial structures will pass through natural selection into the next generation (but does not prohibit them from passing into the next generation). Thus, structures with low-feasibility conformations (such that ∆N > ∆feas) are not automatically eliminated from the population, but rather it is made more difficult for them to pass into the next generation than it otherwise would be based on consideration of Rwp alone. Thus, for example, a structure that has a relatively low value of Rwp and hence a high value of fitness but has some unfavorable intramolecular interaction such that ∆N > ∆feas, could, even after reduction of its fitness by a factor of R-1, still have sufficiently high fitness that it survives through natural selection and passes into the next generation. In the limiting case of ∆feas ) 1, no trial structures are penalized by reduction of fitness, and this situation is therefore analogous to the normal GA calculation in which no assessment of conformational feasibility is carried out. Clearly, as ∆feas becomes smaller, an increasing proportion of trial structures are penalized by reduction of fitness. Results and Discussion To assess the methodology discussed above, four molecular crystal structures were considered as test cases, all representing molecules with significant conformational flexibility and representative of the types of structural problem now routinely solved from powder X-ray diffraction data. In each case, the structure was determined previously from powder X-ray diffraction data (see the references cited), and the actual powder X-ray diffraction data used in the present work were the same data as used in the previous structure determination. In each structure, there is one molecule in the asymmetric unit. Molecule A is the peptide Piv-LPro-Gly-NHMe (Figure 1a), defined by seven variable torsion angles (total number of

6352 J. Phys. Chem. B, Vol. 111, No. 23, 2007

Figure 1. Molecular structures of the test molecules (a) A, (b) B, (c) C, and (d) D. Arrows indicate the variable torsion angles in the GA structure-solution calculations.

structural variables ) 10; in space group P1, the position {x, y, z} of the molecule can be fixed arbitrarily).13a Molecule B is the peptide Piv-LPro-γ-Abu-NHMe (Figure 1b), defined by nine variable torsion angles (total number of structural variables ) 15).7e Molecule C is pentamethylene-bis(diphenylphosphine oxide) [Ph2P(O)(CH2)5P(O)Ph2] (Figure 1c), defined by 10 variable torsion angles (total number of structural variables ) 16).13b Molecule D is heptamethylene-bis(diphenylphosphine oxide) [Ph2P(O)(CH2)7P(O)Ph2] (Figure 1d), defined by 12 variable torsion angles (total number of structural variables ) 18).13c First, we consider the distribution of ∆N values for each molecule, determined from 100000 randomly generated trial crystal structures (i.e., with the correct unit cell and space group, but with random values for the structural variables {x, y, z, θ, φ, ψ, τ1, τ2, ..., τn}). The distributions are shown in Figure 2. In all cases, the distribution is heavily skewed toward low values of ∆N. For C and D, the maximum of the distribution is at ∆N ) 0, whereas for A and B, the maximum of the distribution is at ∆N ≈ 0.02. Table 1 shows the percentages of trial structures with ∆N less than a specific value (∆feas) in the ∆N distributions for A-D. In all cases, slightly more than 50% of the trial structures have ∆N < 0.05, and ca. 95% of trial structures have ∆N < 0.5. The percentage of structures with ∆N < 0.25 ranges from 88% (for A) to 77% (for C). Next, we discuss the results from application of the feasibility assessment within GA structure-solution calculations for molecules A-D. In each case, several different values of ∆feas were considered. For each molecule and for a given value of ∆feas,

Hanson et al. 50 independent GA calculations were carried out, starting from a different (randomly generated) initial population in each case. Apart from the value of ∆feas, all other conditions of the GA calculations were kept constant. Thus, for a given molecule, the same set of 50 initial populations was used for each value of ∆feas, thus ensuring that differences in the results would arise only from the use of the different values of ∆feas. The parameters used in the GA calculation were fixed at the following values (where Ngenmax denotes the maximum number of generations): For molecule A, Npop ) 25, Nmat ) 12, Nmut ) 7, Ngenmax ) 20, R ) 2. For molecule B, Npop ) 50, Nmat ) 25, Nmut ) 10, Ngenmax ) 40, R ) 2. For molecule C, Npop ) 50, Nmat ) 25, Nmut ) 10, Ngenmax ) 60, R ) 2. For molecule D, Npop ) 50, Nmat ) 25, Nmut ) 10, Ngenmax ) 60, R ) 2. It is important to emphasize that these values of Npop are lower than would normally be used in applications of the GA technique for structure solution of new unknown structures. The use of artificially low population sizes in such testing work serves to make the calculations relatively more difficult and thus provides a more stringent test of the effects of the different values of ∆feas. The GA calculations employed an exponential fitness function (defined previously7b) and an implementation of Lamarckian evolution involving local minimization of Rwp for all new trial structures generated in the calculation.7c All calculations were carried out using a single-population version of EAGER.8 We consider the cumulative success rate (denoted S) at a given number of generations (Ngen) as the percentage of the 50 independent GA calculations that successfully locate the correct structure solution at (or before) Ngen generations. For each of the test molecules A-D and for each value of ∆feas, the graph of S versus Ngen is shown in Figure 3, and the value of S at the end of the calculation (i.e., at Ngen ) Ngenmax) is reported in Table 2. To quantify the improvements obtained by the assessment of conformational feasibility relative to the standard GA structure-solution calculation (which corresponds to ∆feas ) 1), we consider the ratio S(∆feas)0.25)/S(∆feas)1). Graphs of S(∆feas)0.25)/S(∆feas)1) versus Ngen for each of the test molecules A-D are shown in Figure 4. At low values of Ngen, the ratio S(∆feas)0.25)/S(∆feas)1) is not defined, as S(∆feas)1) ) 0. For sufficiently high Ngen (higher than the values probed in the present work), S(∆feas)0.25)/S(∆feas)1) is expected to tend toward a value of 1, as S(∆feas)0.25) f 1 and S(∆feas)1) f 1 for sufficiently high Ngen. As an additional comparison of the results obtained for ∆feas ) 0.25 and ∆feas ) 1, Figure 5 shows evolutionary progress plots7b (i.e., graphs of the lowest value of Rwp in the population as a function of generation number) for all GA calculations using these values of ∆feas for test molecules A and B. It is clear from Figures 3 and 5 that the technique described here for assessment of conformational feasibility leads, in general, to a significant increase in the success rate of the GA structure-solution calculation. Thus, for each of the test molecules A-D, the success rate is higher for all values of ∆feas than for the standard GA calculation (i.e., for ∆feas ) 1), with the only exceptions arising for ∆feas ) 0.02 in the case of test molecules A and B. As shown in Figure 3 and Table 2, the improvement in success rate is not critically dependent on the value of ∆feas, provided that the value of ∆feas is greater than ca. 0.05, whereas in general, the success rate appears to diminish

Enhanced Efficiency of Direct-Space Structure Solution

J. Phys. Chem. B, Vol. 111, No. 23, 2007 6353

Figure 2. Distribution of ∆N values for 100000 randomly generated structures for each of the test molecules A-D.

Figure 3. Cumulative success rate (S) versus generation number (Ngen) for GA structure-solution calculations with assessment of conformational feasibility for test molecules (a) A, (b) B, (c) C, and (d) D. In each case, results are shown for different values of ∆feas. For each test molecule and each value of ∆feas, the results show the cumulative success rates determined over 50 independent GA calculations.

slightly for values of ∆feas higher than ca. 0.5. For molecule A, the highest success rate arises for ∆feas ≈ 0.05-0.1, whereas for molecules B-D, the highest success rate arises for ∆feas ≈ 0.25. On the basis of these results, the use of ∆feas ≈ 0.25 would be recommended as a suitable value in the application of this strategy in cases of unknown structures, as this value gives close to maximum improvement in efficiency for all of the test molecules considered here. For this value of ∆feas, overlay plots

of the best structure solution obtained in each of the successful GA calculations for test molecules A and B are shown in Figure 6, clearly demonstrating successful convergence in each case. (For each of these molecules, the discrepancies between the structure solutions obtained from the independent GA calculations are clearly very small and are sufficiently minor that Rietveld refinement would lead to the same final refined structure in each case.)

6354 J. Phys. Chem. B, Vol. 111, No. 23, 2007

Hanson et al.

Figure 4. Graphs of S(∆feas)0.25)/S(∆feas)1) versus Ngen for test molecules (a) A, (b) B, (c) C, and (d) D. In each case, the results for S(∆feas)0.25) and S(∆feas)1) were determined over 50 independent GA calculations (the same calculations as used to construct Figure 3).

Figure 5. Evolutionary progress plots, showing the lowest value of Rwp in the population as a function of generation number, for all GA calculations using ∆feas ) 0.25 (left side, red) and ∆feas ) 1 (right side, black) for test molecules (a) A and (b) B.

The manner in which S(∆feas)0.25)/S(∆feas)1) varies as a function of Ngen reflects the relative rates of increase of S(∆feas)0.25) and S(∆feas)1) as Ngen increases. As is evident

from Figure 4, S(∆feas)0.25)/S(∆feas)1) generally increases in early generations, passes through a maximum, and then decreases in later generations (toward a value of 1 for sufficiently

Enhanced Efficiency of Direct-Space Structure Solution

J. Phys. Chem. B, Vol. 111, No. 23, 2007 6355 Concluding Remarks

Figure 6. Overlay plots of the asymmetric unit in the best structure solution obtained in each of the successful GA calculations for test molecules (a) A and (b) B.

large Ngen). This behavior is a direct consequence of the shape of the S versus Ngen graphs shown in Figure 3; thus, S(∆feas)0.25) rises rapidly in early generations and thereafter increases more slowly as S(∆feas)0.25) ) 1 is approached, whereas S(∆feas)1) shows a much slower increase in early generations. In early generations, the increase in S(∆feas)0.25)/S(∆feas)1) reflects the rapid increase of S(∆feas)0.25), whereas in later generations, S(∆feas)0.25)/S(∆feas)1) falls as S(∆feas)0.25) is already approaching its maximum value while the value of S(∆feas)1) continues to increase. In general, for each of the test molecules considered, S(∆feas)0.25)/S(∆feas)1) attains a value of at least 2 in early generations, before falling gradually at higher values of Ngen. The ∆N distributions shown in Figure 2 and Table 1 suggest that the use of ∆feas ≈ 0.25 would correspond to ca. 70-85% of trial structures being unaffected by the assessment of conformational feasibility (as they have ∆N < ∆feas), and thus the remaining ca. 15-30% of trial structures would be penalized by reduction of fitness. However, whereas random sampling has been used to construct the distributions shown in Figure 2, it is important to note that the GA structure-solution calculation does not represent a random sampling of structures and, in general, structures of higher than average quality are sampled, particularly in the later stages of the calculation. Thus, the actual percentage of structures penalized by reduction of fitness in the GA structure-solution calculations can be expected to be somewhat lower than the ca. 15-30% of trial structures (for ∆feas ≈ 0.25) deduced from the distributions shown in Figure 2.

As discussed above, the strategy reported here for the assessment of conformational feasibility within direct-space structure solution from powder X-ray diffraction data has been shown to give rise to significant improvements in success rate for all test cases considered. As structure solution of flexible molecules defined by a significant number of conformational degrees of freedom is widely acknowledged as one of the greatest challenges in direct-space structure solution, this strategy offers a promising new opportunity for improving the success rate in such cases. We note that the strategy leads to only a negligible increase in the total computational time for the GA structure-solution calculation, and for the calculations discussed above, the percentage increase in computation time was ca. 1.8% for A, 2.1% for B, 2.9% for C, and 3.1% for D.14 There is clearly considerable scope for future optimization of the methodology described here, for example by modification of the procedure for reduction of fitness for trial structures with ∆N > ∆feas. In the present work, the factor R-1 was kept constant for all trial structures with ∆N > ∆feas, but further improvement can be envisaged by making R-1 an appropriate function of ∆N, such that structures with higher values of ∆N are penalized more severely. This issue and other aspects of further optimization of the strategy are currently the subject of ongoing investigations. Whereas the present work employed a simple distance-based function to assess conformational feasibility, it is relevant to note that a number of other approaches have been reported in which information on energetic aspects of trial structures has been considered, together with the powder X-ray diffraction data, in direct-space structure-solution calculations. In general, the energetic information has comprised both intermolecular and intramolecular contributions, with the information on R factor and energy combined within an appropriate hybrid figure-ofmerit. In one strategy,15a the hybrid figure-of-merit uses energetic information to guide the calculation toward regions of parameter space corresponding to plausible structures, and R-factor is then used (under the control of a sliding weighting function) as the main criterion for comparing different energetically plausible structures. Other approaches have employed a fixed weighting parameter in such a hybrid figure-of-merit15b and the use of energy and R-factor information as different components within a Monte Carlo procedure.15c Another strategy, specifically targeted to address the problem of conformationally flexible molecules, incorporates information on the expected values of specific torsion angles to bias the directspace search to explore regions of conformational space corresponding to plausible conformations.15d Acknowledgment. We are grateful to EPSRC and Cardiff University for financial support and to Dr. Scott Habershon and Professor Roy Johnston for helpful discussions. Finally, we note that a different strategy (based on a redefinition of variable space) that also leads to a significant increase in success rate of direct-space structure solution calculations from powder X-ray diffraction data, particularly in the case of conformationally flexible molecules, has been reported recently.16 References and Notes (1) (a) Wales, D. J.; Scheraga, H. A. Science 1999, 285, 1368. (b) Horst, R.; Pardalos, P. M.; Thoai, N. V. Introduction to Global Optimization, 2nd ed.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2000. (c) Pı´nter, J. D., Ed. Global OptimizationsSelected Case Studies; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2003. (d) Wales, D. J.

6356 J. Phys. Chem. B, Vol. 111, No. 23, 2007 Energy Landscapes; Cambridge University Press: Cambridge, U.K., 2003. (e) Johnston, R. L., Ed. Applications of EVolutionary Computation in Chemistry; Structure and Bonding Series; Springer-Verlag: Heidelberg, Germany, 2004; Vol. 110. (2) (a) Schug, A.; Herges, T.; Verma, A.; Lee, K. H.; Wenzel, W. ChemPhysChem 2005, 6, 2640. (b) Morris, G. M.; Olson, A. J.; Goodsell, D. S. In EVolutionary Algorithms in Molecular Design; Clark, D. E., Ed.; Wiley-VCH: Weinheim, Germany, 2000; Chapter 3, p 31. (c) Schulz, F.; Hartke, B. ChemPhysChem 2002, 3, 98. (d) Darby, S.; Mortimer-Jones, T. V.; Johnston, R. L.; Roberts, C. J. Chem. Phys. 2002, 116, 1536. (e) Hamad, S.; Catlow, C. R. A.; Woodley, S. M.; Lago, S.; Mejias, J. A. J. Phys. Chem. B 2005, 109, 15741. (f) Woodley, S. M.; Battle, P. D.; Gale, J. D.; Catlow, C. R. A. Phys. Chem. Chem. Phys. 1999, 1, 2535. (g) Habershon, S.; Zewail, A. H. ChemPhysChem 2006, 7, 353. (h) Sanctuary, B. C. In EVolutionary Algorithms in Molecular Design; Clark, D. E., Ed.; WileyVCH: Weinheim, Germany, 2000; Chapter 10, p 195. (i) Amstrup, B.; Toth, G. J.; Szabo, G.; Rabitz, H.; Lorincz, A. J. Phys. Chem. 1995, 99, 5206. (j) Atabek, O.; Dion, C. M.; Yedder, A.B. J. Phys. B: At. Mol. Opt. Phys. 2003, 36, 4667. (k) Jagielska, A.; Arnautova, Y. A.; Scheraga, H. A. J. Phys. Chem. B 2004, 108, 12181. (3) (a) Harris, K. D. M.; Tremayne, M. Chem. Mater. 1996, 8, 554. (b) Harris, K. D. M.; Tremayne, M.; Kariuki, B. M. Angew. Chem., Int. Ed. 2001, 40, 1626. (c) Chernyshev, V. V. Russ. Chem. Bull. 2001, 50, 2273. (d) David, W. I. F., Shankland, K., McCusker, L. B., Baerlocher, C., Eds. Structure Determination from Powder Diffraction Data; OUP/IUCr: Oxford, U.K., 2002. (e) Harris, K. D. M.; Cheung, E. Y. Chem. Soc. ReV. 2004, 33, 526. (f) Tremayne, M. Philos. Trans. R. Soc. 2004, 362, 2691. (g) Altomare, A.; Caliandro, R.; Camalli, M.; Cuocci, C.; Giacovazzo, C.; Moliterni, A. G. G.; Rizzi, R.; Spagna, R.; Gonzalez-Platas, J. Z. Kristallogr. 2004, 216, 833. (h) Favre-Nicolin, V.; C ˇ erny´, R. Z. Kristallogr. 2004, 216, 847. (i) Shankland, K.; Markvardsen, A. J.; David, W. I. F. Z. Kristallogr. 2004, 216, 857. (j) Brodski, V.; Peschar, R.; Schenk, H.; Brinkmann, A.; Bloemberg, T. G.; van Eck, E. R. H.; Kentgens, A. P. M. J. Phys. Chem. B 2005, 109, 13529. (4) Harris, K. D. M.; Tremayne, M.; Lightfoot, P.; Bruce, P. G. J. Am. Chem. Soc. 1994, 116, 3543. (5) Hartmann, A.K.; Rieger, H. Optimization Algorithms in Physics; Wiley-VCH: Weinheim, Germany, 2002. (6) (a) Rietveld, H. M. J. Appl. Crystallogr. 1969, 2, 65. (b) Young, R. A., Ed. The RietVeld Method; International Union of Crystallography: Oxford, U.K., 1993. (c) McCusker, L. B.; Von Dreele, R. B.; Cox, D. E.; Loue¨r, D.; Scardi, P. J. Appl. Crystallogr. 1999, 32, 36. (7) (a) Kariuki, B. M.; Serrano-Gonza´lez, H.; Johnston, R. L.; Harris, K. D. M. Chem. Phys. Lett. 1997, 280, 189. (b) Harris, K. D. M.; Johnston, R. L.; Kariuki, B. M. Acta Crystallogr. A 1998, 54, 632. (c) Turner, G. W.; Tedesco, E.; Harris, K. D. M.; Johnston, R. L.; Kariuki, B. M. Chem. Phys. Lett. 2000, 321, 183. (d) Harris, K. D. M.; Habershon, S.; Cheung, E. Y.; Johnston, R. L. Z. Kristallogr. 2004, 216, 838. (e) Cheung, E. Y.; McCabe, E. E.; Harris, K. D. M.; Johnston, R. L.; Tedesco, E.; Raja, K. M. P.;

Hanson et al. Balaram, P. Angew. Chem., Int. Ed. 2002, 41, 494. (e) Cheung, E. Y.; Kitchin, S. J.; Harris, K. D. M.; Imai, Y.; Tajima, N.; Kuroda, R. J. Am. Chem. Soc. 2003, 125, 14658. (f) Guo, F.; Harris, K. D. M. J. Am. Chem. Soc. 2005, 127, 7314. (8) Habershon, S.; Turner, G. W.; Zhou, Z.; Kariuki, B. M.; Cheung, E. Y.; Hanson, A. J.; Tedesco, E.; Albesa-Jove´, D.; Chao, M.-H.; Lanning, O. J.; Johnston, R. L.; Harris, K. D. M. EAGERsA Computer Program for Direct-Space Structure Solution from Powder X-ray Diffraction Data; Cardiff University and University of Birmingham: Cardiff, Wales, and Birmingham, U.K., 2001. (9) Values of van der Waals radii due to Pauling were taken from those tabulated in the following paper (values of rb in Table I): Bondi, A. J. Phys. Chem. 1964, 68, 441. For all pairs of atom types in the test molecules considered in the present work, the values of mIJ were in the range 0.480.62 Å. (10) For ease of computation, our current implementation includes all atom pairs in the molecule, including atom pairs in “1,2” interactions (i.e., directly bonded atoms) and “1,3” interactions; the value of dij for these interactions is independent of molecular conformation, and these interactions therefore make a constant contribution to ∆tot. Alternatively, we could readily exclude contributions from 1,2 and 1,3 interactions to ∆tot, although our normalization of ∆tot (discussed below) effectively removes the constant contribution to ∆tot that arises from 1,2 and 1,3 interactions. (11) The reduction in fitness for structures with ∆N > ∆feas is not applied in the specific case of structures created by the mutation operation in a given generation, as previous studies12 have shown that there are evolutionary advantages in allowing all mutants created in a given generation to pass automatically into the next generation. (However, in all subsequent generations, these structures are subjected to reduction in fitness if ∆N > ∆feas.) (12) Habershon, S.; Turner, G. W.; Harris, K. D. M.; Johnston, R. L.; Johnston, J. M. Chem. Phys. Lett. 2002, 353, 185. (13) (a) Tedesco, E.; Harris, K. D. M.; Johnston, R. L.; Turner, G. W.; Raja, K. M. P.; Balaram, P. Chem. Commun. 2001, 1460. (b) Tedesco, E.; Kariuki, B. M.; Calcagno, P.; Harris, K. D. M.; Philp, D. manuscript in preparation. (c) Kariuki, B. M.; Calcagno, P.; Harris, K. D. M.; Philp, D.; Johnston, R. L. Angew. Chem., Int. Ed. 1999, 38, 831. (14) The absolute increase in computation time depends on the number of atoms in the molecule, which dictates the number of terms in the summation in eq 3, rather than the number of variable torsion angles. (15) (a) Lanning, O. J.; Habershon, S.; Harris, K. D. M.; Johnston, R. L.; Kariuki, B. M.; Tedesco, E.; Turner, G. W. Chem. Phys. Lett. 2000, 317, 296. (b) Putz, H.; Scho¨n, J. C.; Jansen, M. J. Appl. Crystallogr. 1999, 32, 864. (c) Brodski, V.; Peschar, R.; Schenk, H. J. Appl. Crystallogr. 2003, 36, 239. (d) Shankland, K.; David, W. I. F.; Csoka, T.; McBride, L. Int. J. Pharm. 1998, 165, 117. (16) Zhou, Z.; Siegler, V.; Cheung, E. Y.; Habershon, S.; Harris, K. D. M.; Johnston, R. L. ChemPhysChem 2007, 8, 650.