ARTICLE pubs.acs.org/JPCA
Molecular Properties from Conformational Ensembles. 1. Dipole Moments of Molecules with Multiple Internal Rotations Tal Lavy,† Daniel Harries,‡ and Amiram Goldblum*,† †
Laboratory of Molecular Modeling and Drug Design, Institute for Drug Research, and ‡Institute of Chemistry and The Fritz Haber Center, The Hebrew University of Jerusalem, Jerusalem 91120, Israel
bS Supporting Information ABSTRACT: We present a novel method for constructing the stable conformational space of small molecules with many rotatable bonds that uses our iterative stochastic elimination (ISE) algorithm, a robust stochastic search method capable of finding ensembles of best solutions for large combinatorial problems. To validate the method, we show that ISE reproduces the best conformers found in a fully exhaustive search, as well as compare computed dipole moments to experimental values, based on molecular ensembles and their Boltzmann distributions. Results were also compared to the alternative molecular dynamics and simulated annealing methods. Our results clarify that many low energy conformations may be required to reproduce molecular properties, while single low energy conformers or ensembles of low energy conformers cannot account for the experimental properties of flexible molecules. Whereas ISE well reproduces conformations that are not separated by very large energy barriers, it has not been successful in reproducing conformations of strained molecules.
1. INTRODUCTION Physico-chemical properties of molecules and their resulting molecular biological activities depend on their three-dimensional structures or conformations. A common property of many biologically active molecules is that rotations about single bonds account for most of the conformational flexibility, enabling structural adaptation to specific environments and functions such as receptor binding, membrane penetration and other biological activities.1,2 The conformational ensemble (CENS) of a molecule may be defined as that part of its conformational space that is populated, i.e., accessible energetically according to the Boltzmann distribution. Characterization of the CENS of a molecule in its environment should enable us to predict its physicochemical properties that depend on conformational variations.3 Searching conformations in silico has been a major issue for theoretical development since the emergence of computational chemistry.4,5 Several methods have been devised for systematic conformational search, such as grid search, search trees, or fragment-building approaches.6 However, for molecules with more than 5-6 rotatable bonds, the large number of potential combinations of rotation angles produces a huge combinatorial computation problem, which requires stochastic methods such as Monte Carlo (MC) and genetic/evolutionary (GA, EA) algorithms.7 Alternate methods involve simulation methodology. r 2011 American Chemical Society
One relatively simple approach is the combined molecular dynamics (MD) with simulated annealing (SA), where the idea is to enhance conformational sampling by starting simulations at high temperatures followed by a series of gradual cooling annealing steps to the required temperature.8a Later, additional, more advanced methodologies were devised, including the celebrated replica exchange methods.8b Grid search and similar methods supply a good coverage of conformational space but suffer from long computation times and require coarse grids that can miss important parts of conformational space. Deterministic methods (MD, SA, and their combinations) require long computation times for molecules having many rotatable bonds, and high energy barriers between conformations are not easy to cross. Novel optimization methods combine stochastic and exhaustive methods to improve the search for global minima.9 Such an “evolution of methods” was developed at the lab of Harold Scheraga, who devised a combined search for improving the detection of minima in the pentapeptides Leu- and Met-enkephalin.9-11 Stochastic methods can Special Issue: Victoria Buch Memorial Received: September 16, 2010 Revised: November 27, 2010 Published: January 06, 2011 5794
dx.doi.org/10.1021/jp108837a | J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A cover larger parts of conformational space in shorter computation times but cannot guarantee the finding of global and closely related minima, could be “caught” in local minima, and may miss important conformations. In this paper, we propose to use a novel generic method developed in our lab, and named iterative stochastic elimination (ISE), that produces the conformational ensembles of flexible molecules. Briefly, ISE is a generic discrete combinatorial optimization method that employs statistics on the best and worst results of a large, stochastically picked, sample, to eliminate specific values of its variables, and to reduce the size of successive samplings. Eliminations are performed in several iterations, leading to a smaller set of combinatorial possibilities. Exhaustive computations of all remaining possibilities follow these iterations once the total number of combinations has been reduced to a computationally feasible one, and sorting all these possibilities produces a very large set of best results for a problem. Problems of sizes 10100 and larger have been studied with ISE, and reproduction of the global minimum and of the most relevant local minima by ISE have been demonstrated in problems of ∼109 alternatives, such as protein side chain conformations,12 by comparing ISE to full exhaustive computations. Furthermore, ISE has been applied successfully to solve the positioning of polar hydrogens in crystallographic structures,13 to construct multiple loops in homology modeling,14 to study the conformations of cyclic peptides,15 to solve problems in protein-protein interactions16 and in flexible ligand docking, and to identify alternative binding modes of ligands.17 ISE does not require lengthy computations and is not limited by energy barriers or by spending computation time on repeated sampling of nonproductive regions of conformational space. In producing conformational ensembles, we eliminate the parts of conformational space that prove to produce conformations that are the worst solutions, where best and worst are judged by energy values of each sample. Following the production of CENS of molecules, we here demonstrate the relevance of these solutions for predicting dipole moments of molecules, as an example of measurable molecular properties that are extremely sensitive to molecular conformations.18,19 Our ability to simulate dipole moments with high quality provides a basis for predicting other molecular properties through the production of relevant conformational ensembles of molecules.
2. METHODS 2.1. General Information. Variables and variable values form the basis for formulating combinatorial problems to be solved by ISE. In the present application, a variable may be a conformational descriptor such as an atom-atom distance, or internal coordinates characterized by bond lengths, bond angles, and dihedral angles. Each such variable should be represented by a set of discrete values that may be generated by computation, imposed by the user, or extracted from databases or other resources. In the present application, we have chosen to limit the variables to dihedral angles around single bonds, which are the most noticeable source of conformational modifications. Each of the variable dihedral angles may assume a large set of discrete values, and the ISE algorithm searches for a set of optimal conformations that are combinations of these values. Subsequently, these conformations are generated to compute the physicochemical properties of the conformational ensemble. Here, we provide an example of such computations of
ARTICLE
the molecular dipole moment and describe the set of steps taken to calculate the CENS. 2.2. Construction of Conformations. Each conformation is constructed by random simultaneous assignment of values for all dihedral angles of rotatable bonds, thus generating coordinates of new molecular conformations. These values are applied to an initial conformation derived from the PDB database of ligands19 or constructed using fragments from Sybyl 7.2.20 We begin by minimizing the initial conformation of a molecule (or “ligand”) prior to the application of ISE. This relaxation step is performed prior to conformer generation because initial conformations are constructed from building blocks in Sybyl. It is performed to optimize bond lengths and bond angles, which are not modified by ISE in the current version used here, while dihedral angles are modified. Subsequently, we assign random values to each of the dihedral angles. These random values are picked from a table of variable values that initially contains a full set of value options for each angle. The number of values is reduced with further iterations, as explained below. Figure 1 demonstrates a table of the variables, in which the dihedral angle variables can have values in increments of 30 and indicates (with an x) values that have been eliminated in previous iterations. These values will not be assigned to the specific dihedral angles in the next and in subsequent samples. A table of variables may include a large number of variables, each having many possible values (of order tens to hundreds) that depend on the choice of size of increment (angular in the case of dihedral angle variables). The total number of potential combinations to be sampled for a specific iteration Ntotal is Ntotal ¼
Nvar Y i¼1
Nvar, i
ð1Þ
where the product is taken over Nvar,i and Ntotal is the total number of values remaining for each of the Nvar variables. For 10 rotatable bonds with 36 conformations at increments of 10 each, the initial number of alternatives for the first iteration is 3610 ≈ 3.65 1015, a number that is clearly out of the scope of any systematic search. 2.3. Sampling. Each set of randomly chosen values of the dihedral angles produces a single conformation X. We sample N such conformations and score each of them. The energy Ei of conformation Xi is computed with a suitable force field (vide infra). For the set of conformations X1 through Xn we obtain a set of energies {E1, E2, ..., En}. 2.4. Elimination of Values. We sort the N sampled conformations ranked according to their energies and construct a distribution histogram for these conformations. Group L of conformations contains a set of the NL lowest energy conformers, and group H contains a equally sized (NL = NH) set of highest energy conformers. Since N is large, the picked variable values (prior to weighting according to their Boltzmann weights) are assumed to be sampled with equal probability. The probability of appearance of each variable value i of a dihedral angle k in the groups of lowest and highest energy conformations should thus be Pi, k ðNL Þ ¼ Pi, k ðNH Þ ¼ Pi, k ðNÞ 3
NL NH ¼ Pi, k ðNÞ 3 N N
ð2Þ
where Pi,k (N) is the number of times a variable is expected to appear in the full sample and Pi,k (NH) is the number of times the 5795
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
Figure 1. Variables table representing the possible values for seven dihedral angles at a rotation increments of 30. Eliminated values are designated by X.
value i of variable k (dihedral angle) is expected to appear in the highest (worst) energy group. A variable value i of a dihedral angle k appearing in NL much less than its expected value Pi,k(NL) or appearing in NH much more than its expected value Pi,k(NH) indicates that the value i contributes mostly to higher energy conformations and may therefore be eliminated from the table of optional values and not be used to generate subsequent conformations. Thus, the number of values contributing to higher energy conformations is reduced. As a result, we expect an increased coverage of the lower energy regions on the molecular potential energy surface. To eliminate values, elimination factors have been assigned: evfL is the factor for the group of lowest energies L and evfH for the group of highest energies. The conditions for eviction are: if the number of appearances in the low energy group, Ai,k(NL) is Ai, k ðNL Þ < evf L 3 Pi, k ðNL Þ,
ð3Þ
or the number of appearances in the high energy group Ai,k(NH) is Ai, k ðNH Þ > evf H 3 Pi, k ðNH Þ
ð4Þ
then variable value i, k may be assigned for elimination. It may be finally eliminated if it fulfills these conditions a single time or only after fulfilling them a multiple of CR times (in a few repeated samples of N conformations). The table shown in Figure 2 presents the number of times a specific variable met the eviction criteria in a set of samples of the dihedral angle values in a molecule. In the current realization of the method, only those values that fulfill the eviction criteria CR =10 repeated times are eliminated. After the values considered worst are eliminated, a new sample is generated, this time using the remaining values, and the process is repeated until the number of possible combinations has been reduced to a point from which the remaining combinations can be exhaustively computed within a reasonable time.
Figure 2. Variables table representing the possible values for seven dihedral angles at rotation increments of 30. For each value, the number of times (1-10) the criteria for elimination was repeatedly met is shown. When a threshold of 10 repeated fulfillments of criteria (CR = 10) is reached, that value is eliminated.
Figure 3. Several conformations of ethyl linoleate.
2.5. Salvaging Erroneous Eliminations. It may be clear from the principles of ISE stated above that treatment of dihedral angles that contribute significantly to both the high and low energy groups could pose a problem. An example of such a case would be a long hydrocarbon that has low energy conformations when it is fully extended, and high energy conformations when its ends are coiled up on itself (see Figure 3). Values of dihedral angles at the center of the molecule might be eliminated by mistake for frequent appearance in the high energy group, resulting in the loss of low energy minima. Therefore, a special procedure for returning erroneous eliminations was added in this application of the ISE algorithm. According to this “salvage” procedure, we divide the total sample in each iteration into three parts. The high energy (NH) and low energy (NL) groups as described in section 2.4, and the median group, composed of all remaining conformations. The number of appearances of each dihedral angle is counted in the lowest energy group, the median energy group, and the highest energy group and then normalized to the relative size of the group. If a value eliminated in an iteration is found to have a normalized number of appearances in the median group that is lower than 5796
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A the normalized numbers of both the high and low energy groups, the elimination is canceled, and the value is returned to the table of values. 2.6. ISE Conformational Search Used To Reproduce Dipole Moments. ISE conformational searches were performed using eviction factors (evf values as in eqs 3 and 4) of 0.6 and 2.0 for the lowest and highest groups, respectively. High and low energy group sizes (NL, NH) of 10% of the total sample size were used at the initial stages of the search and increased to 20% if the highest energy member of the randomly sampled lowest group was in the range of 1.5 times the lowest energy conformer sampled in the same iteration, indicating that the conformational energy surface has become flat. These parameters have been derived from previous optimization calculations described in the Supporting Information. Fulfillment of elimination criteria of 10 times was used for any elimination (CR = 10, see section 2.4); i.e., the elimination criteria had to be repeatedly satisfied 10 times before final elimination of a value is executed. After the elimination criteria were fulfilled 5 times, the salvaging values option (see section 2.5) was activated. Both these parameters were studied in the optimization phase and represent a practical compromise with the need to keep the number of required iterations to a minimum. ISE calculations and the elimination of variable values were performed for each molecule up to a threshold of 106 combinations (conformational options due to the remaining, nondiscarded values of conformational variables). These remaining conformations were then exhaustively scored. Whenever ISE could not further eliminate while the number of combinations did not reach the 1 106 threshold, an exhaustive search was performed on condition that the remaining combinations were less than 3 106 conformations. Approximate run time for an ISE iteration containing 1000 conformations using in-house Cþþ software was 0.2 s on a dual Xeon PC running GNU/Linux OS. Energy evaluation time using the Sybyl 7.2 package is 25-55 s per 1000 energy evaluations (with a single conformation for each), depending on the size of the molecule. The results of the exhaustive search were sorted and ranked according to the conformational energy, and subsequently dipole moments of these conformations were computed and summed according to their Boltzmann distribution weights as described below in section 2.8. 2.7. Choosing the Charge Scheme and Calculation of Dipole Moments. The electrostatic properties of a molecule are determined by the distribution of its electrons and nuclei. Unfortunately, the partial atomic charge is not an experimentally observable quantity, nor can it be unambiguously computed from the wave function using quantum mechanics. Indirect comparisons of various methods for computing partial atomic charges are possible, usually by calculating appropriate experimental quantities and comparing them to experimental results.21 We computed partial atomic charges and the resulting molecular dipole moments of 45 small molecules that lack rotatable bonds, or for which rotation around bonds is insignificant, so that their dipole moments are largely determined by single conformations. Partial atomic charges were computed using nine different methods implemented by the commercial packages of Sybyl 7.2 and Insight II molecular modeling software. In Sybyl 7.2, dipole moments are computed from molecular point charge distributions. The magnitude and X, Y, Z components of the dipole moment are reported in the output.22
ARTICLE
The results for the molecular dipole moments were compared to available experimental results. Figure S9 in the Supporting Information shows the linear correlation to experimental results extracted from the CRC Handbook of Chemistry and Physics.23 The best line, correctly predicting about 90% of the dipole moments, was found to be associated with MMFF charges. In reproducing experimental results for dipole moments, the MMFF force field performed significantly better than others (R2 = 0.8896). We find this to be consistent with the results of Halgren et al. who found that the best correlation of energies to experimental and ab initio results could be achieved by using MMFF with MMFF charges.24 We therefore selected this force field for all subsequent dipole moment value calculations. However, MMFF was much too slow for employing it to compute energies for the various conformations produced by ISE, while TRIPOS force field performs about 10 times faster. Once the method for computing atomic charges was established, we used it to calculate dipole moments for all the conformations in the optimized ensemble (the set of lowest energy conformations) of each molecule, resulting from the ISE conformational optimization. 2.8. Calculation of Dipole Moment in Ensembles of Conformations. The set of molecules for which dipole moments were evaluated and compared to experiments is presented in Figure 4. The total dipole moment of a molecular conformational ensemble was computed by adding the weight Fi of each conformer according to Boltzmann's probability distribution, using energy differences derived from the force field as applied by ISE, Fi ¼
ni expð - Ei =kB TÞ ¼ P N expð - Ei =kB TÞ
ð5Þ
i
Here, Fi is the fraction of conformer i in the ensemble, ni is the number of times this conformer is found in the total ensemble, N is the total number of conformers in the ensemble, Ei is the conformer's energy, T is the temperature, and kB is Boltzmann's constant. Each conformer fraction Fi is then multiplied by its specific conformation-dependent dipole moment to give the thermally averaged dipole moment of the entire ensemble, or the ensemble dipole moment (EDM): X Dtotal ¼ Di F i ð6Þ i
As with other physicochemical properties, the thermally averaged dipole moment ÆDæ of an ensemble of conformations may match the property measured experimentally for a large number of molecules (on the scale of Avogadro's number).25 Calculation of the average dipole moment for a specific molecule was performed by consecutively adding to the sum in eq 6 the associated dipoles of progressively higher energy conformations, until reaching convergence in the value of ÆDæ. 2.9. Scoring Function. In this study the function by which conformations are scored is related to their internal energy. The choice of the scoring function is of utmost importance since both the ranking of conformers and their contribution to ensemble properties are determined by the value of the scoring function. As we are dealing with molecules that could have a large variety of functional groups, a scoring function that was produced for a larger range of molecular entities is required. We employed the Tripos force field for the energy evaluations along the iterative 5797
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
Figure 4. Flexible molecules analyzed for comparison with experimental dipole moments. Values associated with these molecules (no. of rotatable bonds, size of conformational space, and experimental dipole moment) are given in Table 1 below.
steps, and MMFF in the final step for evaluating the dipole moments of lowest energy ensemble. Both Tripos (which is much faster than MMFF) and MMFF force fields have been extensively used and documented as being highly suitable for analyzing small molecules.24,26 The total energy was computed using these force fields as implemented in Sybyl 7.2. 2.10. Molecular Dynamics and Simulated Annealing Calculations. To gain confidence in the results obtained by using ISE, and to determine whether ISE is advantageous over other widely known methods, we compared the ISE conformational ensembles to conformational ensembles derived from molecular dynamics (MD) and simulated annealing (SA) calculations performed on the same molecules. Initial molecular conformations were randomly generated using the same conformational generator used for ISE and subsequently minimized and submitted to SA under the same forcefield and charge scheme as ISE. SA was performed with an initial temperature of 10 000 K and a final temperature of 293-298 K according to the relevant experiments. The annealing time was 10 ns with the stepwise annealing method. The final conformation from SA was subjected to MD runs of 100 ps with a time step of 1 fs and a temperature coupling factor of 5f S(fs). Conformations were extracted along the trajectory (1 at each 50 fs), and their ensemble dipole moment (EDM) was calculated as a simple average over the
2000 extracted conformations (note that the Boltzmann averaging is already explicitly considered along the simulation).
3. RESULTS AND DISCUSSION To reproduce the dipole moment of flexible molecules, it is required to construct in silico the conformational ensemble of a molecule; thus we questioned first the ability of ISE to reproduce an ensemble of conformations that is easily produced in an exhaustive search over the full range of variable values. Reproducing the results of an exhaustive search with ISE isolates the combinatorial issue and eliminates other factors that might interfere with a comparison, most of all that of force field suitability. To maximize the similarity of the search methodology, both ISE and the exhaustive search used the same in-house built program to generate conformations, the first in a random manner and the latter in an exhaustive manner. We first searched for the best parameters to reproduce the results of an exhaustive search by ISE. Comparisons of ISE to full enumeration can be performed only for relatively small systems, because of the time required for exhaustive large combinatorial problems. Therefore, we have performed comparisons for molecules with up to 3 106 conformational alternatives and compared the lowest energy ensembles obtained by the two methods. We found that 100% of the lowest energy conformations in the 5798
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A exhaustive computation were reproduced by the ISE computation, in a fraction of the time required for the exhaustive search (see Supporting Information for details). Constructing conformational ensembles is necessary not only for producing conformational ensembles that determine average properties but also to question the ability of a molecule to reach its bioactive conformation, where it is bound to a protein or another biomolecule. In fact, bioactive conformations of flexible ligands rarely correspond to their free state global energy minimum conformations, and in many cases do not even correspond to a local energy minimum.27,28 For example, Perola et al. observed enthalpy differences of up to 30 kcal/mol between bound conformations and calculated minimum energy conformers in flexible ligands.3 3.1. Reproducing Molecular Dipole Moment of Flexible Small Molecules Using Ensembles of Low Energy Conformers. To test the validity of the conformational ensemble search with ISE, we chose 6 flexible molecules for which experimental dipole moments were published,29 as shown in Table 1. We chose noncyclic molecules, with 7-16 rotatable bonds and dipole moments ranging from ∼1.8 to ∼4.4 D, and lacking chemical groups that might form aggregates in high concentrations, thus distorting the dipole moment values. Molecular structures were built using Sybyl 7.2. In Table 1, the second column presents the initial size of the combinatorial search space, the third presents the number of rotatable bonds, and the experimental dipole moment appears in the last column. ISE search was performed with the Tripos force field as described in sections 2.2-2.7, and was stopped at a stage where approximately 106 alternative conformations remained after rejecting variable values. These conformations were exhaustively searched and ranked according to their total energy. Dipole moments and total energies of the conformational ensemble were computed by the MMFF force field up to a maximum threshold of 4.0 kcal above the global minimum but converged in most cases at lower energy thresholds. The partial atomic charges were assigned as implemented in Sybyl 7.2. Figures 5-10 present the conformations (Figures 5-10, upper panels), ensemble dipole moments (EDM) (Figures 5A-10A, middle panels), and the SA/MD dipole values of each of the 6 examined molecules (Figures 5B-10B, lower panels). To demonstrate the changes in dipole moment following the contribution of each conformer, EDM was computed after introducing each additional conformer into the ensemble. All energy calculations were performed using the same dielectric constant as that of the solvent used in the corresponding experiment. Distribution weights were computed according to eq 5, as described in the Methods, using the temperature reported in the corresponding experiment. In most cases, only a few lowest energy conformations were required to reflect the experimental dipole moment. In other cases, EDM was computed by including a large number of conformations, until reaching convergence. Convergence is achieved as the contribution of higher energy conformations is considered. Their Boltzmann weighted contribution to the ensemble and therefore to the total dipole moment of a molecule diminishes and reaches convergence. The diversity of conformations constituting the ensemble is graphically depicted for each molecule in the upper panels of Figures 5-10. The conformations appearing in these figures are the result of clustering, with a requirement for a minimal 0.5 D between computed dipole moments of these conformations. The EDM graphs included, however, all conformations, as shown in panels A in Figures 5-10.
ARTICLE
Each representative conformation (Figures 5-10) is designated by a letter that appears on the EDM graph (Figures 5A-10A) of the corresponding molecule, as was found in the ensemble. Each figure also indicates the values of individual dipole moments (IDM) computed for the conformations shown, as well as the energy threshold at which it was found in the ensemble. Thus, Figures 5-10 (upper panels) serve to demonstrate the diversity of conformations and associated dipole moments found in this set of molecules with many rotatable bonds, as well as, in many cases, the distance of these individual conformations from the overall computed dipole moment. Plots in Figures 5B-10B, which represent ensembles from SA and subsequent MD runs, are assumed to be ergodic and to be at thermal equilibrium, so that each of their constituting conformations are assumed to contribute equally to the total EDM value, as they should appear with their Boltzmann weight. Each line in these graphs represents the dipole moment for a specific molecule found in a conformation that was picked from MD at the production stage, once in 50 fs. As the particular conformational variations of a molecule are extremely relevant for understanding its ensemble properties, we present a brief discussion of each of the examined molecules. 3.1.1. Ethyl Linoleate. This long chain flexible molecule may, theoretically, adopt conformations in which the unsaturated bonds may be situated cis or trans with respect to each other. Only trans conformations (purple colored part of Figure 5) were found in the ensemble up to an energy threshold of 12 kcal/mol above the conformation at the global minimum. Figure 5 displays four conformers possessing distinct dipole moments, as calculated in this ensemble. The only relevant conformers for computing EDM were A type conformers as the other conformers of B, C, and D type are found at energy thresholds that are too high to significantly contribute to the EDM. Therefore, Figure 5A shows a straight line for ÆDæ at the value of 2.19 D (green symbol), which is also the dipole moment of A type conformers. The large differences between the dipole moments in these conformers are mainly due to the internal conformations of the ester group of the molecule. Further structures in the ensemble (not shown, to maintain figure clarity) similarly display an unlatered effect of the ester function, which is negligibly affected by rotation around the ethyl ester oxygen bond. The ensemble derived by SA and MD, depicted in Figure 5B, shows the dominance of type A conformers having a dipole moment around 2.5 D, with a total EDM of 2.28 D. We find that both the ISE ensemble production and the SA/MD reach a very similar result in this case for the prevailing molecular conformations, and therefore also their average dipole moment. 3.1.2. Dipropyl 1,2-Benzenedicarboxylate. The experimental dipole moment of this molecule in benzene at T = 25 C was found to be 2.84 D.30 Cluster analysis shows that this molecule adopts three main conformations, as shown in the upper panels of Figure 6. The low energy portion plotted in Figure 6A is dominated by type A conformers, which have carbonyls pointing in different directions, thereby reducing the EDM by vector cancelation. At the higher energy threshold (0.7 kcal/mol) the contribution of conformations B (4.54 D) and C (3.94 D) becomes more significant, increasing EDM up to a computed value of 2.63 D. This is an excellent example of the unsuitability of a single computed low energy conformer to account for experimental dipole moments, while a reasonable value can be obtained by 5799
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
Table 1. Flexible Molecules Analyzed for Reproducing Experimental Dipole Moments
using the ensemble approach. The total value of EDM computed from contributions of all conformers is 2.63 D, closer to the experimental value (2.84 D) than any of the individual conformations,
reflecting the importance of the ensemble in reproducing the experimental result. The SA/MD ensemble, shown in Figure 6B, shows a small number of conformations around the value of 5800
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
Figure 5. Conformational possibilities of ethyl linoleate; conformations that contribute to dipole moments different from those of the global minimum (conformation A) are much higher in energy than the lowest one, which is therefore the main contributor: (A) ISE ensemble dipole moment (EDM) of lowest energy conformations of ethyl linoleate; (B) Simulated annealing and molecular dynamics EDM of ethyl linoleate.
2.8 D, while the majority of conformations found are of type A or in between types B and C. The averaged result is again very close to the experimental. 3.1.3. Diethyl 9-Butyl-9H-carbazole-3,6-dicarboxylate. The experimental dipole moment of this molecule in benzene at room
temperature is 3.9 D.30 Cluster analysis results in 3 distinct groups of conformations (Figure 7), two with carbonyls pointing in a mutually parallel (C) and near-perpendicular directions (B) that have higher dipole moments (6.17 and 4.41 D, respectively), and one with carbonyls pointing in nearly opposite directions, 5801
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
Figure 6. Conformational possibilities of dipropyl 1,2-benzenedicarboxylate. Two lowest energy conformations contribute most of the values for the ensemble and for the resultant dipole moment: (A) ISE ensemble dipole moment of dipropyl 1,2-benzenedicarboxylate; (B) SAþMD ensemble dipole moment.
thus partially canceling each other (A) and achieving a lower dipole moment (0.79 D).
In the lowest energy portion of Figure 7A, conformer A dominates with its low dipole moment. The contributions of 5802
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
Figure 7. Three conformations with very close energies to each other but widely differing dipole moments are the main contributors to the ensemble and properties of diethyl 9-butyl-9H-carbazole-3,6-dicarboxylate.: (A) ISE ensemble dipole moment of diethyl 9-butyl-9H-carbazole-3,6-dicarboxylate; (B) SAþMD ensemble dipole moment of diethyl 9-butyl-9H-carbazole-3,6-dicarboxylate.
5803
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
Figure 8. Many conformations with close energies and several different dipole moments form the weighted total for (2,4,6-trimethyl1,3-phenylene)bis[phenyl methanone]: (A) ISE ensemble dipole moment of (2,4,6-trimethyl-1,3-phenylene)bis[phenyl methanone]; (B) SAþMD ensemble dipole moment. 5804
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
Figure 9. The global minimum energy conformation (A) determines to a large extent the final result in this case of diphenyl 9-(3-methylbutyl)-9H-carbazole3,6-dicarboxylate: (A) ISE ensemble dipole moment of diphenyl 9-(3-methylbutyl)-9H-carbazole-3,6-dicarboxylate; (B) SAþMD ensemble dipole moment.
5805
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
Figure 10. While the lowest energy conformation of [1-(1,1-dimethylethyl)-2,2-dimethylpropyl] oxalate supplies a dipole moment that is very close to experimental, higher energy conformations that are included by computationally assuming a Boltzmann weighted equilibrium reduce this value to a lower dipole moment. The correlation between experiment and calculation breaks down here due to the wrong assumption that the equilibrium between all conformations exists, while not taking notice of the barriers for conformational interconversion in this highly strained molecule. In this case, the ISE algorithm fails while SA þ MD are successful. (A) The ISE ensemble dipole moment of [1-(1,1-dimethylethyl)-2,2-dimethyl-propyl]. (B) SAþMD Ensemble dipole moment. 5806
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A conformations B and C are found at a slightly higher energy threshold (0.024 kcals/mol), increasing the EDM up to a value of 3.9 D. The SA/MD ensemble (Figure 7B) shows considerable contributions of type A and B conformations but none of conformation C; therefore, the total value of EDM reaches only 1.28 D, much lower than the experimental in this case. In this case we suggest that SA/MD did not reach ergodicity and could not reproduce the low energy conformation clusters (type C), while ISE reproduced all three types of clusters and was thus able to closely reproduce the experimental dipole moment. 3.1.4. (2,4,6-Trimethyl-1,3-phenylene)bis[phenyl methanone]. The experimental dipole moment of this molecule in carbon tetrachloride at 25 C is 3.0 D.30 The extreme flexibility of this molecule, with almost no restrictions on rotations around dihedral angles, results in a diverse group of clusters (Figure 8), ranging from low dipole moments (such as I, 0.44D) in which carbonyls point in opposite directions, to high dipole moments (such as G, 8.77 D) in which carbonyls point in the same direction. The experimental dipole moment was reproduced with the lowest energy conformation (3.0 D) and again with an ensemble of the lowest energy conformations, which converged at a threshold of approximately 1.16 kcal (Figure 8A) to a value of 3.02 D. In this case, both approaches, one that uses only the low energy conformation, or the other that considers the whole ensemble, reproduced values that are close to the experimental dipole moment. The SA/MD ensemble includes the whole array of conformation types presented in Figure 8B, ranging from 0.5 to 7.3 D. However, when all contributions are combined, the cumulative value of the EDM value of 3.55 D is close to the experimental value of 3.0 D, again reflecting the importance of considering the complete ensemble. 3.1.5. Diphenyl 9-(3-Methylbutyl)-9H-carbazole-3,6-dicarboxylate. The experimental dipole moment of this molecule was determined in benzene at 25 C to be 4.39 D.30 Clustering shows four dominant types of conformers (Figure 9), two of them (A and C) possessing carbonyl oxygens pointing in approximately the same direction, thus reaching large dipole moments (5.28 and 7.18 D, respectively) and the other two (B and D) with carbonyls pointing in directions that nearly cancel, thus achieving low dipole moments (1.7 and 2.27 D, respectively). The experimental dipole moment is not close to that of any of the individual conformers, but the ensemble average shown in Figure 9A converged at a threshold of approximately 1.0 kcal to a value of ∼4.8 D and included conformer types A, B, C, and D. The SA/MD ensemble (Figure 9B) shows diverse conformations with dipoles ranging from 0.5 to 9.4 D. However, most of the individual conformations have a dipole moment around the value of 4.0 D, corresponding to the conformational cluster type A, which was also found to have the lowest energy in the ISE search. This ensemble gives a total EDM value of 3.92 D. 3.1.6. Bis[1-(1,1-dimethylethyl)-2,2-dimethyl-propyl] Oxalate. The dipole moment of this molecule in benzene at 20 C was experimentally found to be 3.3 D.30 Due to its bulky methyl groups (see Table 1 for structures), rotation around the central carbon-carbon bond of this molecule causes intramolecular clashes. The cis conformation is, therefore, unfavorable and was not found among the low energy conformations. Figure 10 shows planar views of the trans conformations in the plane of the oxalate group, demonstrating the diversity of bulky groups configurations and this group's influence upon IDM. The IDM of the lowest energy conformation (A, 3.39 D) reflects the experimental
ARTICLE
dipole, while further enlargement of the ensemble to a threshold of approximately 1 kcal (B-E) lowers EDM to approximately 2.4 D. This further enlargement of the ensemble adds conformers F (3.92 and 4.45 D, respectively) to the ensemble, but their statistical contribution to the ensemble is close to zero (see blue line in Figure 10A). This molecule, with bulky groups and relatively few low energy conformations, is much different from the other molecules, in that its properties are determined by the global minimum alone and not by an ensemble of conformations. Due to its strained structure, many high energy barriers separate the energy minima, and crossing these barriers is difficult. Thus, the molecules tend to remain at their lowest energy minimum. In our computational method minimum energy conformations are seen and treated as ensemble members. Therefore, such barriers are not adequately taken into account. Hence, ISE overestimates the number of conformations that can contribute to an ensemble and to its properties. Thus in cases of highly constrained molecules with high energy barriers between conformations, ISE should focus only on finding the global energy minimum and not attempt to identify ensembles of conformations that do not contribute to the molecule's properties. The SA/MD ensemble (Figure 10B) is mainly composed of conformations possessing dipole moments around the experimental value (3.3 D). This result again supports the conclusion that high energy barriers will strongly bias/direct flexible molecules to adopt low energy conformations. Computational searches may find conformations of slightly higher energies than the lowest (types B-G), but these conformations would have no physical significance since the bulk of molecules will tend to stay in the lowest state surrounded by high energy barriers. The SA/MD ensemble, having approached ergodicity, cannot allow conformations to reach over the high energy barriers from the initial, SA minimized, lower energy conformation (type A), and thus manages to better reproduce the experimental value than ISE, in this case.
4. CONCLUSIONS We have shown here that reproducing physical properties of conformationally flexible molecules requires the proper generation of a conformational ensemble of these molecules, because many physical properties vary sensitively with different conformations. One such property, which is highly dependent on conformation, is the molecular dipole moment, which results from the charge distribution and the distances between the positive and negative charge densities in the molecule. We have shown that, in most cases, using the global minimum or a small number of low energy conformers, obtained by conformational search, cannot account for the experimental properties of flexible molecules. The number of conformations found within a given energy threshold of an ensemble is a measure of the flexibility of the molecule. This apparent flexibility depends to some extent on the rotation increments used for computation and is not a proof of the prevalence of these conformations. Clustering these conformations into representative conformations can help map primary candidates for the conformational ensemble. Summing up each conformation's Boltzmann weighted contributions in ISE ensembles makes it possible to calculate experimental quantities that would have been impossible to calculate with individual, low energy conformations or with ensembles of multiple 5807
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A minima obtained by SA/MD simulations. The determination of the optimal ensemble threshold may involve system-specific considerations that depend strongly on the structure of the molecule and the accuracy of the force field used for the conformational search. We have also demonstrated that for a molecule with bulky groups and high energy barriers it can be anticipated that ensembles would not give a satisfactory analysis. ISE is based on eliminating conformational parameters that contribute consistently to high energy conformations. Thus, ISE eliminates high energy portions of the conformational space (including high energy barriers, if they exist for a specific molecule) and finally finds a set of low energy minima, which are thermodynamically favored. These low energy minima may contribute less in reality in those cases that require energy barriers to be crossed to reach these minima, a process that may be very slow and impractical due to the height of the barriers. In these relatively rare cases of very crowded yet flexible molecules, ISEgenerated ensembles may be misleading and are not expected to yield useful data beyond finding the global minimum conformation. More generally, computation of correct conformational ensembles can gain new insights into the accessible molecular conformational space and therefore enables us to consider entropic aspects that cannot be accounted for by other methods. This ability goes beyong the previously established suitability of the ISE algorithm for producing sets of best solutions to highly combinatorial problems. As is well-known, quite a few biological compounds and biologically active molecules have long sequences of rotating bonds and can adopt a multitude of energetically close conformations with relatively small energy barriers, therefore requiring study of their conformational ensembles. These molecules are more easily treated by ISE, which produces such ensembles faster and more efficiently than SA/MD, and thus ISE serves as a promising method to model such molecules.
’ ASSOCIATED CONTENT
bS
Supporting Information. Description of the process of optimizing the parameters of the iterative stochastic elimination algorithm in its application to the formation of appropriate conformation ensembles. Graph of results for the choice of the atomic charge scheme that is consistent with dipole moments of rigid molecules. This material is available free of charge via the Internet at http://pubs.acs.org.
’ AUTHOR INFORMATION Corresponding Author
*E-mail:
[email protected].
’ ACKNOWLEDGMENT We thank Regina Politi for her extensive help with the figures of this paper. T.L. thanks Dinorah Barasch, Amit Michaeli, David Marcus, Andrea Scaiewitz, and Ilanit Maymoni for help during his research. A.G. and D. H. dedicate this paper to the memory of Victoria Buch. Her passion for science as well as her teachings in many fields will be remembered and appreciated. ’ REFERENCES (1) Leach, A. R. Molecular Modeling Principles and Applications. Conformational Analysis, 2nd ed.; Prentice Hall: Englewood Cliffs, NJ, U.S., 2001; Chapter 9.
ARTICLE
(2) Noszal, B.; Kraszni, M. Conformer-Specific Partition Coefficient: Theory and Determination. J. Phys. Chem. B 2002, 106, 1066– 1068. (3) Perola, E.; Charifson, P. S. Conformational Analysis of DrugLike Molecules Bound to Proteins: An Extensive Study of Ligand Reorganization upon Binding. J. Phys. Chem. 2004, 47, 2499–2510. (4) Zimmermann, S. S.; Pottle, M. S.; Nemethy, G.; Scheraga, H. A. Conformational-Analysis of 20 Naturally Occurring Amino Acid Residues using ECEPP. Macromolecules 1977, 10 (1), 1–9. (5) Perahia, D.; Pullman, A. Success of PCILO method and failure of CNDO/2 method for predicting conformations of some conjugated systems. Chem. Phys. Lett. 1973, 19 (1), 73–75. (6) Leach, A. R.; Prout, K; Dolata, D. P. Automated Conformational Analysis and Structure Generation: Algorithms for Molecular Perception. J. Chem. Inf. Comput. Sci. 1990, 30, 316–324. (7) Judson, R. S.; Jaeger, W. P.; Treasurywala, A. M.; et al. Conformational SearchingMethods for Small Molecules: Genetic Algorithm Approach. J. Comput. Chem. 1993, 14, 1407–1414. (8) (a) Kirkpatrick, S.; Gellat, C. D.; Vecchi, M. P., Jr. Optimization by Simulated Annealing. Science 1983, 220, 671–680. (b) Kannan, S.; Zacahrias, M. Simulated Annealing coupled replica exchange molecular dynamics - an efficient conformational sampling method. J. Struct. Biol. 2009, 166, 288–294. (9) Lee, J.; Scheraga, H. A.; Rackovsky, S. New optimization method for conformational energy calculations on polypeptides: Conformational space annealing. J. Comput. Chem. 1997, 18 (9), 1222–1232. (10) Purisima, E. O.; Scheraga, H. A. An approach to the multipleminima problem in protein folding by relaxing dimentionality - Tests on enkephalin. J. Mol. Biol. 1987, 196 (3), 697–709. (11) Isogai, Y.; Nemethy, G.; Scheraga, H. A. Enkephalin - Conformational analysis by means of empirical energy calculations. Proc. Natl. Acad. Sci. U.S.A. 1977, 74 (2), 414–418. (12) Glick, M.; Rayan, A.; Goldblum, A. A Stochastic Algorithm for Global Optimization and for Best Populations: A Test Case of Side Chains in Proteins. Proc. Natl. Acad. Sci. U.S.A. 2002, 99 (2), 703–708. (13) Glick, M.; Goldblum, A. A Novel Energy-Based Stochastic Method for Positioning Polar Protons in Protein Structures from X-rays. Proteins 2000, 38 (3), 273–87. (14) Rayan, A.; et al. Stochastic Algorithm for Kinase Homology Model Construction. Curr. Med. Chem. 2004, 11 (6), 675–692. (15) Rayan, A.; Senderovitch, H.; Goldblum, A. Exploring the Conformational Space of Cyclic Peptides by a Stochastic Search Method. J. Mol. Graph. Model. 2004, 22, 319–333. (16) Rayan, A.; Marcus, D.; Givaty, O.; Barasch, D.; Goldblum, A. Double Focusing by Molecular Bioactivity and Drug Likeness. Presented at 230th ACS meeting. Washington, DC, Aug 28-Sep 1, 2005. (17) Antoine, R.; Compagnon, I.; Rayane, D.; et al. Electric Dipole Moments and Conformations of Isolated Peptides. Eur. Phys. J. D 2002, 20, 583–587. (18) Malagoli, M.; Manoharan, M.; Kippelen, B.; et al. Impact of Conformation on the Dipole Moment of Bis-triarylamine Derivatives. Chem. Phys. Lett. 2002, 354, 283–290. (19) RCSB Protein Data Bank. http://www.rcsb.org/pdb/home/ home.do. (20) Tripos Sybyl 7.2 Molecular Modeling Software; Tripos: St. Louis, MO; http://www.tripos.com. (21) Leach, A. R. Molecular Modeling Principles and Applications; Electrostatic Interactions, 2nd ed.; Prentice Hall: Englewood Cliffs, NJ, 2001; section 4.9. (22) Section 2.4.9. Tripos Bookshelf version 7.2.3. Revised early 2006. (23) Lide D. R. Dipole Moments of Molecules in Gas Phase. CRC Handbook for Chemistry and Physics; CRC Press: Boca Raton, FL, 1997; section 9. (24) Halgren, T. A. Characterization of MMFF94, MMFF94s, and other widely available force fields for conformational energies and for 5808
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809
The Journal of Physical Chemistry A
ARTICLE
intermolecular-interaction energies and geometries. J. Comput. Chem. 1999, 20 (7), 730–748. (25) Reif, F. Fundamentals of Statistical and Thermal Physics; McGraw-Hill: New York, 1965. (26) Clark, M.; Cramer, R. D.; Vanopdenbosch, N. Validation of the General-purpose TRIPOS 5.2 Force-field. J. Comput. Chem. 1989, 10 (8), 982–1012. (27) Nicklaus, M. C.; Wang, S.; Driscoll, J. S.; et al. Conformational Changes of Small Molecules Binding to Proteins. Bioorg. Med. Chem. 1995, 3, 411–428. (28) Vieth, M.; Hirst, J. D.; Brooks, C. L., III. Do active site conformations of small ligands correspond to low free energy solution structures. J. Comp.-Aid. Mol. Des. 1998, 12, 563–572. (29) Hadfield, A. T. Analysis of three structurally related antiviral compounds in complex with human rhinovirus 16. Proc. Natl. Acad. Sci. U.S.A. 1999, 96 (26), 14730–14735. (30) McClellan, A. L. Tables of Experimental Dipole Moments; Rahara Enterprises: 1989; Vol. 3.
5809
dx.doi.org/10.1021/jp108837a |J. Phys. Chem. A 2011, 115, 5794–5809