Determination of Absolute Stereochemistry of Flexible Molecules

2 days ago - Experimental determination of the absolute stereochemistry of chiral molecules has been a fundamental challenge in natural sciences for ...
0 downloads 0 Views 3MB Size
Article Cite This: J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

pubs.acs.org/jcim

Determination of Absolute Stereochemistry of Flexible Molecules Using a Vibrational Circular Dichroism Spectra Alignment Algorithm Lennard Böselt,† Dominik Sidler,† Tobias Kittelmann,‡ Jürgen Stohner,‡ Daniel Zindel,† Trixie Wagner,¶ and Sereina Riniker*,† †

Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820 Wädenswil, Switzerland ¶ Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4056 Basel, Switzerland ‡

J. Chem. Inf. Model. Downloaded from pubs.acs.org by BETHEL UNIV on 03/27/19. For personal use only.

S Supporting Information *

ABSTRACT: Experimental determination of the absolute stereochemistry of chiral molecules has been a fundamental challenge in natural sciences for decades. Vibrational circular dichroism (VCD) spectroscopy represents an attractive alternative to traditional methods like X-ray crystallography due to the use of molecules in solution. The interpretation of measured VCD spectra and thus the assignment of the absolute configuration relies on quantum-mechanical calculations. While such calculations are straightforward for rigid molecules with a single conformation, the need to estimate the correct conformational ensemble and energy landscape to obtain the appropriate theoretical spectra poses significant challenges for flexible molecules. In this work, we present the development of a VCD spectra alignment (VSA) algorithm to compare theoretical and experimental VCD spectra. The algorithm determines which enantiomer is more likely to reproduce the experimental spectrum and thus allows the correct assignment of the absolute stereochemistry. The VSA algorithm is successfully applied to determine the absolute chirality of highly flexible molecules, including commercial drug substances. Furthermore, we show that the computational cost can be reduced by performing the full frequency calculation only for a reduced set of conformers. The presented approach has the potential to allow the determination of the absolute configuration of chiral molecules in a robust and efficient manner.



INTRODUCTION Chirality of molecules is fundamental to life. As enantiomers have different binding affinities for proteins, the knowledge about the absolute configuration of a drug compound is crucial in the pharmaceutical industry.1,2 Various direct and indirect approaches such as single crystal X-ray diffraction, chiroptical methods (e.g., optical rotation), or NMR combined with chiral derivatives can be used to assign the absolute configuration of a compound.3 X-ray diffraction is typically regarded as the most reliable method;4 however, crystals of suitable quality and size are required. For these reasons, conformationally flexible molecules or complex natural products represent particular challenges for single crystal X-ray diffraction. A promising alternative that circumvents many of the disadvantages of X-ray crystallography is vibrational circular dichroism (VCD) spectroscopy,5−11 a technique closely related to infrared (IR) spectroscopy but sensitive to chirality. VCD spectra can be measured in solution. The fundamental theory behind VCD is described by eq 112 IVCD ∝ Rab = Im(⟨a|μe |b⟩⟨b|μm |a⟩)

parity, and thus enantiomers give exactly the same spectrum with opposite sign.7 The rotational strength is connected to the experimentally accessible differential molar absorptivity Δϵ by eq 27 ÄÅ ÉÑ ÅÅ ÑÑ γj/π ÅÅ ÑÑ ν̃ ÅÅ Ñ Δϵ(ν)̃ = R · ∑ j 2 2Ñ −39 Å ÑÑ Å 2.236·10 ( − ) + ν ν γ ̃ ̃ Å j j Ñ j (2) ÇÅÅ ÖÑÑ where ν̃ is a specific wavenumber in the spectrum, R j is the rotational strength of peak j, γj is the decay constant of peak j, and ν̃j is the wavenumber of peak j. The absolute configuration of a molecule is then assigned by comparing theoretical spectra of both enantiomers with the experimental spectrum. Nowadays, quantum-mechanical (QM) software packages are readily available to solve eq 1 and to obtain theoretical spectra.6 However, the comparison of theoretical and experimental spectra to establish an unequivocal correlation remains cumbersome for multiple reasons, which has hampered the routine use of VCD spectroscopy:

(1)

where IVCD is the experimentally measured intensity, R ab is the rotational strength belonging to the transition of state a to b, and μe and μb are the electric and magnetic dipole operators, respectively. The product ⟨a|μe |b⟩⟨b|μm |a⟩ is always of odd © XXXX American Chemical Society

Special Issue: Women in Computational Chemistry Received: November 7, 2018

A

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 1. Illustration of global DNA sequence alignment using the NW22 algorithm. The input is two sequences of characters, where the first one is the reference. Note that the order of the characters in a sequence is fixed.

based on self and overlap integrals of the spectra after optimal scaling and shifting of individual peaks. The similarity index is computed after spectral alignment and is similar to the Pearson coefficient in statistics. To use VCD methodology for flexible, pharmaceutically relevant molecules, Sherer et al.14 developed a procedure based on extensive sampling of the conformational space using an in silico conformer generator, the selection of low-energy conformers as well as diverse conformers, and applying either a modified version of the SimVCD approach or the similarity index as implemented in BioTools for spectra matching. The consideration of diverse conformers in addition to the lowest-energy conformers is necessary in this procedure due to issue (e) above, i.e. the weights based on gas-phase QM energies may not represent the weights in solution. In general, a more robust determination of relevant conformers with their appropriate weights would be highly advantageous to enable a routine use of VCD. However, employing ab initio MD or QM/MM MD simulations with explicit solvent to determine the conformational ensemble in solution is currently still too expensive in most cases. In order to improve on the robustness of the existing spectra-matching methods for flexible molecules, we propose a novel approach to automatize the VCD spectra comparison. It is based on the observation that the experimental and theoretical VCD spectra should have the same “pattern” of peaks, although shifted with respect to each other. If the spectra are viewed as a series of peaks with different intensities, a Needleman-Wunsch (NW)22 type algorithm, which is normally used for global sequence alignment in bioinformatics, can be employed to align the theoretical and experimental VCD spectra. This is a rather different Ansatz then used in other methods.14,17−20 The main goal of the VCD spectra alignment (VSA) algorithm is thereby to solve the practical problem of determining which enantiomer reproduces the experimental spectrum better, providing confidence for the assignment of the absolute configuration. As the NW algorithm is an optimization algorithm, it cannot be guaranteed in all cases that every matched peak pair is correct. The power of the VSA approach is demonstrated by the successful determination of the absolute configuration for a range of conformationally rigid, semirigid, and flexible molecules of significant structural complexity, including pharmaceutically active molecules. We show that this can be achieved for theoretical spectra calculated with limited computational resources (i.e., no implicit solvent model or high level of theory was used in the computation), as might apply in a high-throughput setting in industry.

(a) Although intermolecular solute−solute interactions can contribute to the measured spectrum, they are usually neglected in the QM calculations to reduce the computational cost.9,13 Similarly, solvent effects are usually not considered explicitly, which may affect the frequency and sign of normal modes.13 (b) In the QM calculations, a high level of theory as well as the consideration of anharmonicities would be necessary to obtain the correct sign for every peak in the spectrum. However, very accurate QM methods are oftentimes not feasible.14 (c) In frequency calculations, the potential energy operator is truncated after the second term of the Taylor expansion, resulting in shifts of the peaks in the theoretical spectrum.15 Moreover, anharmonic effects like Fermi resonance might change the character of the experimental spectrum.16 (d) As multiple conformers may contribute to the experimental spectrum of flexible molecules, the individual conformers and their VCD spectra need to be determined. Thus, for molecules with many rotational bonds, the computation of the potential-energy landscape becomes computationally expensive and is a major bottleneck of the standard protocol using VCD.14 (e) In addition, the appropriate weights of all relevant conformers need to be determined to reproduce the experimental spectrum.17−19 Typically, the weighting is based on Boltzmann using the energies obtained from gas-phase QM optimization. However, these energies may not correspond to those in solution, resulting in incorrect weights of conformers. The above-mentioned issues are the reason that VCD spectroscopy is still challenging to apply to flexible molecules. With sufficiently large computational resources available, most can be addressed. However, in an industrial setting the demand is for a high throughput technique, which is capable of handling large and flexible molecules at low computational cost (i.e., gas-phase calculation, use of lower-accuracy QM method). Different methods were introduced in the past to simplify the comparison between the theoretical and experimental spectrum. Nicu et al.20 introduced the concept of robust and nonrobust peaks. Using the property of scalar products, eq 1 can be written as Rab = cos(ζ )|μe ||μm |

(3)

where ζ is the electromagnetic angle, which solely determines the sign of Rab. The authors argued that only theoretical bands with an angle significantly different from 90° should be used to determine the absolute stereochemistry, as they are unlikely to change signs by inaccuracies in the calculation. However, it was noticed later that the electromagnetic angle is a noninvariant property and therefore unsuitable for determining the absolute stereochemistry.21 Different quantitative measures have been proposed to judge how likely an aligned theoretical spectrum fits to an experimental spectrum. Examples are SimVCD,17 the similarity index,18 and the dissymmetry factor.19 SimVCD is



DESIGN OF THE VSA ALGORITHM

The Needleman-Wunsch (NW) algorithm22 is used in bioinformatics for the global alignment of DNA/RNA and amino-acid sequences. The algorithm relies on a scoring function s, which accounts for gaps, mismatches (mutations), and matches between the sequences, while the order of characters in a sequence is fixed (Figure 1). The scoring function s is the key ingredient in the NW algorithm. For the alignment of VCD spectra, the peaks of the B

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 2. Schematic illustration of the VSA algorithm using compound 1 as an example. (Step 1): The normalized experimental (black) and theoretical (red) VCD spectra are the input. For the theoretical spectra, a Lorentz bandwidth of 6 cm−1 is used. (Step 2): The experimental and theoretical peaks are picked automatically (shown as dots). (Step 3): For each theoretical peak (red dot), a NW score sij is computed using eq 4. The green vertical line visualizes the assumed “correct” position of the peak (using the scaling factor μ = 0.99). The colored lines denote the score sij given an intensity Ij of the experimental peak (yellow: Ij = 0.4 , dark blue: Ij = 0.3, orange: Ij = 0.2 , light blue: Ij = 0.1). Note that the sign of the peaks is not taken into account, and the order of the peaks in a spectrum is preserved. If requested, upward shifts can be prevented by a cutoff. The algorithm matches the peak pairs such that the global score S is maximized. (Step 4): For visual inspection, the theoretical peaks are superimposed with the matched peaks aligned.

issues (a)−(c) discussed in the Introduction. With its parameters σ 2 and μ, the Gaussian function describes a “window” around a theoretical peak in which the dynamic programming algorithm “looks” for an experimental peak with similar intensity. It can be viewed similar to the maximum allowed shift of 20 cm−1 in the approach by Sherer et al.14 Note that eq 4 allows in principle upward shifts of theoretical peaks. Although in general theoretical peaks are considered to be at higher wave numbers than the experimental peaks due to the neglect of anharmonicities (i.e., only downward shifts should occur in the alignment), there are rare cases such as efavirenz (compound 14) where the theoretical peaks are clearly at lower wave numbers. Upward shifts can be prevented in the VSA algorithm in a straightforward manner by a cutoff. In this study, we allow only downward shifts for rigid and semirigid molecules, whereas we investigate both possibilities for the flexible molecules. The scoring function takes νĩ , νj̃ , Ii , and Ij as input variables,

experimental and theoretical spectra are used as input variables (characters in the sequence) for the NW algorithm. The following scoring function has been designed to match peaks considering only their intensity not their sign

sij(νĩ , νj̃ , Ii , Ij ; μ , σ 2) =

2 yz zyz jij ij νĩ j zz − μ j z j z j |Ii·Ij| jj k νj̃ { zzzz jj− · exp jj 2π σ 2 2σ 2 zzzz jj jj zz k {

(4)

where ν̃i is the wavenumber of a theoretical peak i, ν̃j is the wavenumber of an experimental peak j, Ii is the corresponding intensity of the theoretical peak i, Ij is the corresponding intensity of the experimental peak j, and σ 2 and μ are the variance and the mean of a Gaussian function that accounts for the shift between theoretical and experimental peaks. Ii is computed as Ii =

while μ and σ 2 are predefined parameters. These parameters were determined using vibrational frequency scaling factors in the Computational Chemistry Comparison and Benchmark Database (CCCBDB) of the National Institute of Standards and Technology (NIST).23 The theoretical results in the database were obtained with the same level of theory as used in this work. The fitting procedure is described in detail in the Experimental Section. Although the sign information is essential for determining the absolute stereochemistry, it is

νj̃

R · i −39 π 2.236·10

(5)

which describes the intensity of the theoretical peak i at the experimental peak position ν̃j . The scoring function works on the unprocessed theoretical spectrum (no broadening) and the experimental spectrum, and it does not contain physical meaning. The exponential term is necessary because theoretical peaks are shifted with respect to the experimental peaks due to C

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling K−1

not taken into account during the alignment step (i.e., in the scoring function) but only afterward when calculating the quantitative measure of the alignment. The reason for this is that the issues listed in the Introduction can cause changes in the sign of a peak. In addition, we found that the inclusion of the sign information increases the risk of overfitting. The NW algorithm optimally aligns the sequences of peaks by maximizing the final score S



S=

matched peaks i , j

r=

N−1

∑ i=0

ci N

K−1

K−1

∑k = 0 Ct2, k ∑k = 0 Ce2, k

(8)

where the theoretical channels Ct are dependent on the weights of the conformers. The weights of the contributing conformers (starting from Boltzmann weights based on gas-phase QM energies) can then be adjusted by maximizing the absolute value of the Pearson coefficient |r |. This results in an optimized theoretical VCD spectrum. Under the assumption that the scoring function maps theoretical and experimental peaks perfectly, these adjusted conformer weights should provide a link to the experimentally observed conformer distribution. We focus in the VSA approach on the VCD spectrum. As every VCD peak corresponds to an IR peak, the alignment should in principle lead to an alignment of the theoretical IR spectrum with the experimental IR spectrum. We show this by overlaying the experimental IR spectrum with the theoretical IR spectrum, which was obtained by taking the absolute value of the aligned VCD spectrum. There are cases where peaks that are not resolved in the IR spectrum may be resolved in the VCD spectrum (and vice versa). This issue could be addressed by also considering the IR spectrum for the peak selection and alignment procedure.

sij (6)

The workflow of the VSA algorithm is shown schematically in Figure 2. The design of the algorithm involves additional assumptions. Similar to IR spectroscopy, VCD is a fast method, i.e. each contributing conformer is usually visible in the experimental spectrum. Therefore, it is assumed that for flexible molecules each conformer can be aligned independently to the experimental spectrum. The alignment algorithm further assumes that no conformer of the incorrect enantiomer exists that results in the same spectrum as generated by a conformer of the correct enantiomer in the spectral region of interest. This assumption may not hold necessarily for very high-energy conformers, but these are typically removed during the conformer search procedure. In addition, we consider the QM method B3LYP/6-31G**24−27 to be sufficiently accurate for the calculation of theoretical VCD spectra. The VSA algorithm is not limited to this level of theory, but the usage of other combinations of functionals and basis sets might require a refitting of σ 2 and μ. However, such a refitting can be done in a very short time. The theoretical VCD spectra of both possible enantiomers are aligned to the experimental spectrum. In order to optimize the weights of the contributing conformers, the quality of the alignment needs to be quantified. The final score S from the VSA algorithm cannot be used for this purpose as it is the same for both enantiomers. In addition, it is not obvious how the scores of the individual conformers of flexible molecules are to be combined in a global score. An alternative approach is to calculate a quantitative measure of the quality of the alignment. As the aligned peaks match exactly with the VSA algorithm, the well-known Pearson coefficient used in statistics to compute the linear correlation of two variables can be employed directly as a quantitative measure. We note that the similarity index proposed in ref 18 is very similar to the Pearson coefficient but introduces additional parameters. The Pearson coefficient can take values between −1 (anticorrelation), 0 (no correlation), and +1 (correlation). In the framework of a symmetric scoring function in eq 4, the Pearson coefficient should be positive for the correct enantiomer and negative for the incorrect one. To calculate the Pearson coefficient of two aligned VCD spectra, each spectrum is divided into K channels with a width of 6 cm−1. For each channel, Ck is computed as Ck =

∑k = 0 Ct , k ·Ce , k



RESULTS AND DISCUSSION The VSA algorithm was tested on a set of rigid molecules and semirigid molecules with a small number of conformers, as well as a set of highly flexible molecules, including pharmaceutically relevant compounds. Many of the flexible molecules have a high number of rotatable bonds, providing a challenging test for the VSA algorithm. The experimental VCD spectra of part of the molecules were taken from the literature;14,28−30 for others the VCD spectrum was measured in this study. Rigid Molecules. The first test set consists of rigid molecules with a single conformer (Scheme 1). Using this set Scheme 1. Conformationally Rigid Molecules with a Single Conformera

a The experimental VCD spectra were either taken from ref 14 for compound 7 or experimentally determined using commercially available samples.

of molecules, it could be tested whether the VSA algorithm is able to match the correct pairs of theoretical and experimental peaks without the added complexity of conformational sampling. In Table 1, the Pearson coefficients for the correct and incorrect enantiomers are listed for the set of conformationally rigid molecules. The chosen wavenumber range was

(7)

where N is the number of data points in the channel, and ci is a data point (i.e., intensity). The Pearson coefficient r between the experimental (subscript e) and theoretical (subscript t) spectra is then calculated as D

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

(|r | = 0.60, Figure S9 in the Supporting Information). In general, however, the automatic and manual peak selection yielded similar alignments (Table S1 in the Supporting Information). Based on the rigid test set, the following issues with the scoring function and with the Pearson coefficient as a measure of the quality of the alignment can be observed. First, the Pearson coefficient is sensitive to the relative intensities of the peaks. In the case of compound 1 for example, the alignment of the theoretical and experimental spectra by the VSA algorithm is perfect but due to differences in the relative intensities (after normalization) the obtained Pearson coefficient (|r | = 0.86) is not as high as for compounds 3 and 5. This indicates that a visual inspection can be useful in addition to the calculation of the Pearson coefficient. Second, the Pearson coefficient is sensitive to the baseline. For compound 6 for example (Figure S6 in the Supporting Information), a constant offset of the baseline of the experimental spectrum with respect to the theoretical spectra is present, impacting the Pearson coefficient negatively (|r | = 0.62 ). This issue could be addressed by a better preprocessing of the experimental spectrum (e.g., employing a more sophisticated baseline correction or zeroing these regions). Third, the correct matching of peak pairs with the VSA algorithm can be impacted if the experimental spectrum is noisy or if multiple peaks of similar intensity are close by in the low wavenumber range. The latter was observed for example for compound 6. Manual picking of the experimental peaks did not improve the alignment (Table S1 and Figure S13 in the Supporting Information). Fourth, the signs of the peaks in the theoretical versus the experimental spectrum were found to disagree in some cases, for example for compound 2 in the region 1000−1100 cm−1 (Figure 3). This could be due to the issues discussed in the Introduction (i.e., neglecting intermolecular interactions, too low level of theory in the QM calculations, or broadened peaks in the experimental spectrum). The calculation of the Pearson coefficient does not account for these potential issues. Nevertheless, the correct

Table 1. Performance of the VSA Algorithm on the Set of Conformationally Rigid Molecules (Scheme 1) As Measured by the Pearson Coefficient r for the Correct (c) and Incorrect (ic ) Enantiomersa compound

a [cm−1]

b [cm−1]

rc

ric

1 2 3 4 5 6 7

1000 1000 1025 1000 1000 1000 1000

1600 1500 1500 1500 1500 1350 1600

0.86 0.47 0.90 0.70 0.90 0.62 0.81

−0.86 −0.47 −0.90 −0.70 −0.90 −0.62 −0.81

a The alignment was performed in the wavenumber range [a, b]. Experimental peaks were selected automatically; no upwards shifts were allowed.

determined by the quality of the experimental spectrum. For all rigid molecules, the Pearson coefficient rc is greater than zero, confirming the correct enantiomer. Note that the Pearson coefficients of the two enantiomers are symmetrical (rc = −ric ) because the chosen scoring function given in eq 4 does not take the sign of the peaks into account (see discussion above for the reason). As examples for the alignment of experimental and computational spectra, the unprocessed spectra as well as the aligned spectra for compounds 1 and 2 are shown in Figure 3. The same information is shown for the other molecules in Figures S1−S7 in the Supporting Information (with manual peak selection in Figures S8−S14). As can be seen in Figure 3, the experimental and theoretical spectra could be aligned nearly perfectly for compound 1, which is reflected by a high value of the Pearson coefficient (|r | = 0.86). For compound 2, the Pearson coefficient is lower (|r | = 0.47) than for the other compounds as the peaks between 1200 and 1300 cm−1 are not well aligned using the automatic peak selection for the experimental spectrum. When the peaks were selected manually, the Pearson coefficient increased for this compound

Figure 3. Comparison of the theoretical and experimental VCD spectra for the two enantiomers of (+)-fenchone (1, A) and (−)-β-pinene (2, B). The unprocessed theoretical spectra (left) and the aligned spectra (right) are shown with the experimental in black, the correct enantiomer in red, and the incorrect enantiomer in orange. Alignment was performed with the VSA algorithm, experimental peaks were picked automatically, and no upward shifts were allowed. E

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

conformer(s) based on QM gas-phase energies is similar to the dominant conformer predicted by the optimization scheme. To analyze the effect of the optimization approach, we discuss in detail (R)-(+)-3-methylcyclohexanone (8), for which the population of the two dominant conformers has been studied in the literature with different experimental techniques and computation. The methyl group is in an equatorial position in the major conformer (e) and in an axial position in the minor conformer (a). The two conformers give rise to profoundly different VCD spectra (Figure 5). Experimental studies have determined the free-energy difference ΔGc between the conformers to be 6.8 kJ/mol in the gas phase,31 4.6 kJ/mol in CDCl3,32 and 5.2 kJ/mol in CCl4.33 A later study found the energy difference ΔHc between the conformers in CDCl3 to be 5.6 kJ/mol.34 In ref 33, the energy difference in the gas phase with B3PW91/TZVP was calculated to be 6.2 kJ/mol, which increased to 6.3 kJ/mol when a polarized continuum model (PCM) for CCl4 was used instead of decreasing.33 With B3LYP/6-31G** in the gas phase, we found an energy difference of 6.9 kJ/mol. Optimizing the weights, the energy difference decreased to 5.0 kJ/mol (Figure 5), which is close to the experimental values. Thus, for 3-methylcyclohexanone the optimization was able to identify a conformer distribution that reflects the distribution in solution better than the gas-phase distribution. Future research on more examples will show if this holds in general. Flexible Molecules. The VSA algorithm was further tested on a set of 11 conformationally flexible compounds (Scheme 3), most of them drug molecules. The performance of the VSA algorithm for this set is shown in Table 3. The aligned experimental and theoretical VCD spectra with and without optimization of the conformer distribution are shown for all flexible molecules in Figures S30−S40 and Figures S41−S51, respectively, in the Supporting Information (with manual peak selection in Figures S52−S62). For all molecules, the unoptimized Pearson coefficient for the correct enantiomer rc is positive, and thus, the correct stereochemistry of all tested compounds can be assigned based on rc > 0. However, for compounds 17 and 21 rc is close to zero before optimization, indicating issues with the alignment. From Figure 6A, it is clear that the baseline in the experimental spectrum of compound 17 has not been corrected appropriately, resulting in a low Pearson coefficient. In the case of compound 21, both the quality of the experimental spectrum is low (only a relatively small wavenumber range could be used for alignment) and the conformer selection and distribution might not be appropriate. These are thus good examples that the Pearson coefficient can be used to detect problematic

enantiomer could be assigned in all tested cases based on rc > 0. Semi-Rigid Molecules. The second test set consisted of semirigid molecules with a small number of relevant conformers that can be identified in a straightforward manner (Scheme 2). The performance of the VSA algorithm for this Scheme 2. Conformationally Semirigid Molecules with a Single Conformera

a

The experimental VCD spectra were either taken from ref 30 for compound 8 or from ref 14 for compounds 9−14.

set is shown in Table 2. The aligned experimental and theoretical VCD spectra with and without optimization of the conformer distribution are shown for all semirigid molecules in Figures S15−S22 in the Supporting Information (with manual peak selection in Figures S23−S29). For all semirigid molecules except efavirenz (14), the unoptimized Pearson coefficient (Boltzmann weights based on QM gas-phase energies) for the correct enantiomer rc is positive, and thus the correct stereochemistry of all tested compounds can be assigned based on rc > 0. Efavirenz is one of the rare cases, where the theoretical peaks are clearly at lower wave numbers than the experimental peaks (Figure 4A). When allowing upward shifts in the alignment with the VSA algorithm, the theoretical spectrum can be aligned well, and the resulting Pearson coefficient for the correct enantiomer is rc = 0.77 (Figure 4C and Figure S22 in the Supporting Information). For the semirigid molecules, only a small change in the alignment occurred when applying the optimization procedure for the conformer weights. This implies that the lowest-energy

Table 2. Performance of the VSA Algorithm for Conformationally Semirigid Molecules (Scheme 2) As Measured by the Pearson Coefficient r for the Correct (c) and Incorrect (ic) Enantiomersa unoptimized −1

−1

optimized

compound

no. of conformers

a [cm ]

b [cm ]

rc

ric

rc

ric

8 9 10 11 12 13 14

2 2 2 3 3 2 2

1000 1000 1000 1000 1000 1000 1000

1500 1450 1600 1600 1600 1600 1600

0.93 0.50 0.71 0.48 0.78 0.68 0.02

−0.93 −0.50 −0.71 −0.48 −0.78 −0.68 −0.02

0.93 0.50 0.76 0.69 0.89 0.70 0.11

−0.93 −0.50 −0.76 −0.69 −0.89 −0.70 −0.11

a

The alignment was performed in the wavenumber range [a, b]. Experimental peaks were selected automatically; no upwards shifts were allowed. F

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 4. Efavirenz, compound 14. The experimental VCD spectrum is shown in black, and the theoretical VCD spectrum is shown in red for the correct enantiomer and orange for the incorrect enantiomer. Experimental peaks were selected automatically. No optimization of the conformer distribution was applied. (A): Unprocessed theoretical spectra. (B): Aligned theoretical VCD spectra without upward shifts allowed (left) and with upward shifts allowed (right) in the VSA algorithm.

Figure 5. (R)-(+)-3-Methylcyclohexanone, compound 8. The experimental VCD spectrum is shown in black, and the theoretical VCD spectrum is shown in red for the correct enantiomer and in orange for the incorrect enantiomer. Experimental peaks were selected manually; no upward shifts were allowed. (A): Aligned theoretical VCD spectra of the major conformer e (left) and the minor conformer a (right). (B): QM energy E of the two conformers relative to conformer e in the gas phase (blue), optimized for the correct enantiomer (red). (C): Combined aligned theoretical VCD spectra with optimized weights.

alignments (|r | < 0.2). These can then be inspected visually to decide appropriate measures (e.g., improvement of the experimental spectrum, use of a more sophisticated conformational search strategy, etc.). Optimization of the absolute Pearson coefficient |r | improved the alignment substantially in some cases. In an ideal case, the optimized conformer distribution should better reflect the true distribution in solution. In general, however, many factors can influence the optimization such as the accuracy of the conformational search, the finite number of conformers considered, overlapping peaks in the experimental spectrum, alignment errors, etc. In the case of compound 16 for example, the lowest-energy conformer based on QM gasphase energies was predicted to be energetically less favorable

by the optimization scheme (see Figure S41 in the Supporting Information). If the lowest-energy conformer is indeed the dominant conformer with respect to the VCD spectrum, the improvement of the alignment by the optimization scheme is much smaller (e.g., for compounds 18, 19, and 20). For fluralaner (25), allowing upward shifts in the VSA algorithm led to a substantial improvement of the alignment. However, in this case the issue is most likely not theoretical peaks at lower wave numbers but the conformer distribution (Figure 7). For the other flexible molecules, the difference was generally small, thus in the following we focus on results with upward shifts allowed in the alignment. For some of the flexible molecules more than 200 conformers were obtained with the strategy employed for G

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Scheme 3. Conformationally Flexible Molecules Used for Testinga

a The experimental VCD spectra were either taken from ref 14 for compounds 15−20, extracted from ref 29 for compounds 23 and 24, extracted from ref 28 for compound 25, or experimentally determined using commercially available samples (compounds 21 and 22).

Table 3. Performance of the VSA Algorithm for Conformationally Flexible Molecules (Scheme 3) As Measured by the Pearson coefficient r for the Correct (c) Enantiomera unoptimized rc

optimized rc

compound

no. of conformers

a [cm−1]

b [cm−1]

D

U+D

D

U+D

15 16 17 18 19 20 21 22 23 24 25

12 126 73 141 74 202 5 98 205 179 44

1000 1000 1000 1000 1150 1000 1000 1050 1000 1000 1000

1600 1600 1600 1600 1450 1600 1350 1450 1600 1600 1600

0.57 0.37 0.10 0.79 0.47 0.66 0.04 0.22 0.58 0.67 0.36

0.57 0.30 0.11 0.67 0.57 0.59 0.04 0.25 0.57 0.74 0.79

0.63 0.63 0.24 0.86 0.70 0.86 0.16 0.54 0.82 0.85 0.80

0.64 0.64 0.64 0.86 0.69 0.84 0.14 0.60 0.84 0.87 0.95

a The alignment was performed in the wavenumber range [a, b]. Experimental peaks were selected automatically. Results with no upwards shifts allowed are marked with D; results with shifts in both directions are marked with U + D.

molecules using only the 20 conformers lowest in energy. The unoptimized and optimized aligned VCD spectra are shown in Figure 9 for simvastatin (20) and elegandiol (23) as the molecules with the highest number of conformers. The other molecules are shown in Figures S72−S81 in the Supporting Information. For all molecules, similar Pearson coefficients were obtained compared to the ones using the full conformational ensemble. Of course, limiting the number of conformers bears also risks. It is therefore advised to monitor the convergence of the Pearson coefficient (unoptimized conformer distribution) as a function of the number of conformers. Based on our findings, we propose the following procedure after an ensemble of relevant conformers was generated: (1) QM geometry optimization of the conformer set, (2) full frequency calculation for the 20 lowest-energy conformers to

conformational sampling. When plotting the Pearson coefficient (unoptimized weights) as a function of the number of conformers considered (ordered by QM gas-phase energy), it can be seen that |r | converges after approximately 20 conformers for all compounds (see Figure 8 for compounds 20 and 23 as example, the results for the other molecules are shown in Figures S62−S71 in the Supporting Information). The computationally most expensive step in the generation of the theoretical VCD spectrum is the full frequency calculation, whereas the prior geometry optimization with DFT is much faster. Therefore, being able to reduce the number of conformers for which a full frequency calculation has to be performed would decrease the required computational effort substantially. We note that this is a general opportunity, also applicable to other methods for VCD spectra matching. In Table 4, the Pearson coefficients are listed for the flexible H

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 6. Comparison of the theoretical and experimental VCD spectra for the two enantiomers of laropiprant (17, A) and tadalafil (21, B). The unprocessed theoretical spectra (left) and the aligned spectra (unoptimized weights, right) are shown with the experimental in black, the correct enantiomer in red, and the incorrect enantiomer in orange. The full conformational ensemble was used. Experimental peaks were picked automatically; no upward shifts were allowed. The experimental VCD spectrum of 17 was taken from ref 14. Note that the difference in peak heights in the theoretical spectrum before and after the alignment is larger for flexible molecules due to the higher number of conformers that are aligned separately.

alignment, especially for flexible molecules. With the VSA algorithm, this is easily possible by not applying a cutoff during alignment. (2) The selection of peaks in experimental VCD spectra can be challenging due to noise. By comparison with manual selection, it was shown that the VSA algorithm is robust enough to handle a very simple definition of peaks that leads to the selection of many noise peaks. (3) With a quantitative measure of the alignment at hand, the conformer distribution can be refined by optimizing the Pearson coefficient. We showed for the example of 3-methylcyclohexanone that this leads to conformer populations closer to the experimentally determined distribution than based on gasphase energies. We note that for flexible molecules the situation is more complex as additional factors (e.g., exhaustiveness of the conformational search, number of conformers considered, etc.) can influence the alignment and optimization. Therefore, the optimization should not be overinterpreted. (4) As the full frequency calculation is the computational bottleneck in the VCD spectra calculation, it would be beneficial to reduce the number of conformers for which this calculation has to be performed. Monitoring the Pearson coefficient as a function of the number of conformers indicated that it is converged at around 20 conformers for the flexible molecules tested. Limiting the number of conformers in the alignment to 20 resulted in a similar performance than with the full conformer set. In conclusion, we have successfully demonstrated that our approach allows the correct determination of the absolute configuration even for conformationally flexible molecules with significant structural complexity and using theoretical VCD spectra calculated with limited computational resources. Our approach improves the robustness of VCD as a technique for the assignment of absolute configuration; it is straightforward to use and will help to make VCD a true alternative to current standard techniques like X-ray crystallography. Further improvements of the VSA algorithm can be obtained by more sophisticated scoring functions or conformational

obtain the VCD spectra, and (3) alignment of the theoretical and experimental spectra using the VSA algorithm and calculation of the Pearson coefficient. Step (3) can be performed as a function of the number of conformers (1 to 20) to assess the convergence of the Pearson coefficient. If it is not sufficiently converged, the VCD spectra of additional conformers can be calculated and added. When the Pearson coefficient is converged and sufficiently high (>0.2), optimization of the conformer distribution can optionally be carried out.



CONCLUSION We have designed, implemented and validated a VCD spectra alignment (VSA) algorithm to determine the absolute configuration of chiral molecules. The aim was for a robust and efficient approach to handle flexible molecules and spectra calculated with limited computational resources to allow high throughput. The basic idea of the VSA algorithm is that a spectrum is viewed as a series of peaks such that a NeedlemanWunsch-type algorithm can be used for the global alignment. Matched peak pairs determined by the VSA algorithm can then be used for superimposition of the experimental and theoretical VCD spectra. From the aligned spectra, a Pearson coefficient known from statistics can be computed as a quantitative measure of the alignment. If a high Pearson coefficient is obtained, the absolute configuration can be assigned with high confidence. For low Pearson coefficients (i.e., when the initial agreement between experimental and theoretical VCD spectra is marginal), we recommend to step back from making an assignment of the absolute configuration and to visually inspect the aligned spectra to identify possible causes. In such cases, either a better experimental spectrum or improved computational efforts may be necessary such as larger basis sets, different functionals, and/or inclusion of solvent effects and intermolecular interactions. We have investigated different options of the algorithm: (1) Rare upward shifts of theoretical peaks can be necessary for I

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 7. Fluralaner, compound 25. The experimental VCD spectrum is shown in black, and the theoretical VCD spectrum is shown in red for the correct enantiomer and in orange for the incorrect enantiomer. Experimental peaks were selected automatically. The full conformational ensemble was used. (A): Unprocessed theoretical spectra. (B): Aligned theoretical VCD spectra (unoptimized weights) with no upward shifts allowed (left) and with upward shifts allowed (right) in the VSA algorithm. (C): Optimization of conformer weights. (Left): Aligned and optimized theoretical VCD spectra when no upward shifts were allowed in the VSA algorithm. (Right): QM energy E of the conformers relative to the lowest-energy conformer in the gas phase (blue), optimized for the correct enantiomer (red), and optimized for the incorrect enantiomer (orange). No upward shifts were allowed.

Figure 8. Pearson coefficient (unoptimized conformer distribution) for simvastatin (20, left) and elegandiol (23, right) as a function of the number of conformers. Peaks were picked automatically, and upward shifts were allowed.

concentrations in CCl4 were selected such that the IR absorbance was between 0.3 and 0.9 (i.e., 162 mg/mL for 1, 166 mg/mL for 2, 163 mg/mL for 3, 162 mg/mL for 4, 161 mg/mL for 5, 160 mg/mL for 6, 80 mg/mL for 10, 84 mg/mL for 11, 79 mg/mL for 12, and 57 mg/mL for 13). The liquid cell consisted of KBr windows with a path length of b = 0.1 mm. Data was collected in blocks, where the instrument recorded about 1400 scans over the course of 20 min and averaged those scans into one block. Typical runs involved averaging several blocks (≥ 3) for the sample and solvent. Baseline correction was performed by subtracting the spectrum of the other enantiomer (if available) or by subtracting the measured background spectrum.

sampling strategies, as well as the use of higher levels of theory in the QM calculations together with implicit or explicit solvent models.



EXPERIMENTAL SECTION VCD Measurements. Compounds 1−6, 10−13, and 21− 22 were obtained from Sigma-Aldrich. The AC of each compound was known prior to assignment. For compounds 1−5 and 10−12, both enantiomers were available for measurement. Liquid phase VCD spectra of compounds 1−6 and 10−13 were measured with a Bruker Vertex 70 spectrometer equipped with a PMA 50 unit, with 4 cm−1 resolution. Sample J

DOI: 10.1021/acs.jcim.8b00789 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Table 4. Performance of the VSA Algorithm for Conformationally Flexible Molecules Using a Fixed Number of 20 LowestEnergy Conformers (QM Energy in the Gas-Phase) As Measured by the Pearson Coefficient r for the Correct (c) and Incorrect (ic) Enantiomersa unoptimized −1

−1

optimized

compound

no. of conformers

a [cm ]

b [cm ]

rc

ric

rc

ric

16 18 19 20 22 23 24 25

20 20 20 20 20 20 20 20

1000 1000 1150 1000 1150 1000 1000 1000

1600 1600 1600 1600 1450 1600 1600 1600

0.30 0.67 0.56 0.46 0.25 0.57 0.76 0.77

−0.30 −0.67 −0.56 −0.46 −0.25 −0.57 −0.76 −0.77

0.63 0.85 0.66 0.80 0.60 0.75 0.84 0.93

−0.63 −0.85 −0.66 −0.80 −0.60 −0.75 −0.84 −0.93

a

The alignment was performed in the wavenumber range [a, b]. Experimental peaks were selected automatically; upwards shifts were allowed.

Figure 9. Comparison of the theoretical and experimental VCD spectra for the two enantiomers of simvastatin (20, A) and elegandiol (23, B). The unoptimized aligned theoretical spectra (left) and the optimized aligned spectra (right) are shown with the experimental in black, the correct enantiomer in red, and the incorrect enantiomer in orange. Only the 20 lowest-energy conformers were considered. The alignment was performed in the wavenumber range 1000−1600 cm−1 for both compounds. Peaks were picked automatically, and upward shifts were allowed. The experimental VCD spectrum of 20 was taken from ref 14, and the spectrum of 23 was taken from ref 29.

contains also metallorganic compounds and organic radicals, the value of μ was set to 0.99. The standard deviation was systematically optimized until the alignment between the theoretical and experimental VCD spectra of compound 14 was described successfully, resulting in σ 2 = 1.8·10−3. Compound 14 was chosen due to the large shifts between theoretical and experimental peaks that were observed. In general, multiple choices of μ and σ 2 will be possible. Computational Details of the VSA Algorithm. The computational workflow is divided into three parts: (i) conformer generation, (ii) QM optimization and VCD spectra calculation, and (iii) alignment of the obtained theoretical spectra with the experimental spectrum. In the first step, 4500 conformers were generated using the ETKDG method35 as implemented in the open-source cheminformatics toolkit RDKit.36 Similar geometries were discarded if the root-mean-square deviation (RMSD) of the heavy atoms was