Computer-Assisted 3D Structure Elucidation (CASE-3D) of Natural

Jan 11, 2018 - (26) A recent contribution is a hybrid Python/Java system developed by Goodman's group, which encapsulates the DFT chemical shift predi...
4 downloads 13 Views 1MB Size
Article Cite This: J. Nat. Prod. XXXX, XXX, XXX−XXX

pubs.acs.org/jnp

Computer-Assisted 3D Structure Elucidation (CASE-3D) of Natural Products Combining Isotropic and Anisotropic NMR Parameters Armando Navarro-Vázquez,*,† Roberto R. Gil,*,‡ and Kirill Blinov§ †

Departamento de Química Fundamental, Universidade Federal de Pernambuco, Avenida Professor Moraes Rego, 1235, Cidade Universitária, 50670-901 Recife, PE, Brazil ‡ Department of Chemistry, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, Pennsylvania 15213, United States § MestReLab Research S. L. Feliciano Barrera, 9 Baixo, Santiago de Compostela, A Coruña, 15706 Spain S Supporting Information *

ABSTRACT: A computer-assisted structural elucidation (CASE-3D) strategy based on the use of isotropic and/or anisotropic NMR data is proposed to elucidate relative configuration and preferred conformation in complex natural products. The methodology involves the selection of conformational models through the use of the Akaike Information Criterion and scoring of the different configurations. As illustrative examples, the methodology furnished the correct configuration of the already known compounds artemisinin (1) and homodimericin A (2). Revised structures (5 and 6), including their absolute configuration, for the recently reported curcusones I (3) and J (4) are proposed.

C

standard set of homonuclear and heteronuclear 2D NMR experiments. In a typical CASE procedure, a list of different molecular constitutions for the same molecular formula is generated from the analysis of the spectroscopic correlations. Then, the different constitutions are ranked according to empirical chemical shift predictions, mostly 13C shifts, although the use of density functional theory (DFT) predictions have also been proposed.12 However, in most of the cases after the constitution is determined, the presence of stereogenic centers requires the further determination of relative configuration. Classically, relative configuration is established on the basis of spatial information, such as internuclear distances determined from NOE experiments, or angular information from vicinal scalar couplings. In the past decade, two new methodologies have entered the toolbox of the natural product chemists. One is the prediction of chemical shifts and scalar couplings through DFT computations.13−15 The use of 13C chemical shifts is particularly attractive, as they are sufficiently accurate to distinguish between differences in shifts due to configurational/conformational changes and can be measured with high accuracy from proton-decoupled 13C NMR spectra. Statistical estimators such as DP416,17 and its recently improved versions18,19 have been very successful, as they allow an estimation of the relative probability of different configurations to correspond to a given 13 C NMR spectrum.

omplete automation of structural elucidation of natural products can be considered a holy grail in chemistry. But how far are we from achieving this goal when considering the current state of the art? Although single-crystal X-ray analysis will directly furnish molecular constitution and relative and even absolute configuration1 of natural products, crystallization is frequently a cumbersome, if not impossible, task due to the usually very small available amounts of compound, the presence of impurities, and molecular flexibility. Therefore, liquid-state NMR has become the preferred methodology for the structural elucidation of natural products, although other solid-state methodologies such as powder X-ray2 or the more recently introduced molecular sponges3 are opening promising pathways. The determination of molecular constitution, i.e., the 2D molecular graph, by NMR is nowadays a much easier and errorfree task due to the significant improvements in NMR sensitivity as well from the development of 2D NMR pulse programs. For instance, 1H−15N HMBC experiments are extremely useful in the case of alkaloids,4 whereas the use of 1,1-ADEQUATE experiments can alleviate many ambiguous cases in polycyclic molecules where the HMBC experiment becomes extremely ambiguous.5 However, NMR elucidation can be very time-consuming, and from early years of using this technique the possibility of using computer expert systems to help spectroscopists handle NMR-rich structural information became evident. Different methodologies and algorithms developed over the years6−9 have been crystallized today in very mature computer-assisted structural elucidation (CASE) software programs.10,11 These programs can solve efficiently the constitution of molecules of medium complexity given a © XXXX American Chemical Society and American Society of Pharmacognosy

Received: November 2, 2017

A

DOI: 10.1021/acs.jnatprod.7b00926 J. Nat. Prod. XXXX, XXX, XXX−XXX

Journal of Natural Products

Article

single tensor approximation or neglecting vibrational corrections can be artificially compensated by increasing the conformational amplitudes of spurious conformations, the number of statistically relevant conformations was filtered through the use of the Akaike Information Criterion (AIC).33 The AIC aims at minimizing the information lost when a model with k free parameters (in the present case the m − 1 conformational amplitudes, where m is the number of conformations included in the model) is employed to represent “full reality”. The AIC is given by

The other new methodology is the use of anisotropic NMR parameters, principally residual dipolar couplings (RDCs), which have been successfully applied to the determination of relative configuration for several natural products.2,20,21 Also, simple methodologies have been developed22 for the measurement of residual chemical shift anisotropies (RCSAs) in small molecules. These methodologies have been proven successful for the solution of configurational/constitutional problems.5,23 In any case, the problem of molecular flexibility should be tackled since for flexible molecules the observed NMR properties are the result of conformational averaging over the NMR time scale. For most organic molecules, the wells on the potential surface are deep enough as to approximate the whole molecular dynamics as instantaneous jumps between discrete conformations. In the most favorable case, the conformational amplitudes can be well estimated from molecular modeling data, usually at the DFT level. However, in other cases, for instance when intramolecular hydrogen bonds are present, or in highly coordinating solvents such as DMSO, the DFT energies are far from being sufficiently accurate. In this case, conformational deconvolution by least-squares optimization of the conformational amplitudes to fit the reported data is needed. This is the concept behind the popular NAMFIS methodology.24,25 Nonetheless, incorporation of these 3D structural elucidation methodologies into the existing CASE-2D methodologies to solve for relative configuration is far from being a trivial problem, although attempts have been made to incorporate NOE restrictions into CASE programs.26 A recent contribution is a hybrid Python/Java system developed by Goodman’s group, which encapsulates the DFT chemical shift prediction computations and DP4 analysis steps.17 The group of Hoye has also provided scripts for automation of GIAO computations according to their recently published protocol.27 In a recently reported contribution, we have proposed a new protocol based on the use of one-bond (1DCH) RDCs for the automated determination of relative configuration.28 This protocol began with the acquisition of a standard set of NMR experiments (1H, 13C, COSY, HSQC, HMBC) as well as the measurement of 1DCH RDCs through F1-coupled HSQC experiments using reversibly compressible poly(methyl methacrylate) (PMMA) or poly-(2-hydroxyethyl methacrylate) gels for acquisition of RDCs in organic solvents or disodium cromoglycate in the case of water-soluble samples. From the acquired set of NMR spectra, the constitution is then determined using commercially available CASE software. The constitution is ranked according to empirical 13C chemical shift predictions, and the best scoring structure is stored in the form of an SDF29 file along with a list of 1H and 13C chemical shift assignments. This extended protocol, referred to from here as CASE-3D, begins with the generation of initial 3D geometries for each diastereomeric structure. Although this is far from being a trivial problem, currently available commercial software can handle it in a robust way. Schrodinger’s LigPrep software was employed with success.30 Conformational ensembles for each diastereoisomer are then generated using a molecular mechanics conformational search. Conformational amplitudes in these ensembles can then be optimized to fit the experimental RDC data by the use of the single-tensor approximation,31 by simultaneous optimization of alignment tensor components and populations of conformers.32 Since uncertainties on the experimental RDCs and deficiencies in the proposed models such as limitations of the

(1) AIC = ‐2 ln L̂ + 2k where L̂ is the maximum value for the likelihood function for the model being analyzed, i.e., the maximum probability that can be obtained with a k-parameter model, to reproduce a given set of experimental data. If it is assumed, as a working raw approximation, that the uncertainties follow a normal distribution and are uncorrelated, the AIC can be simply cast in terms of the χ2 quality factor, according to the expression

AIC = χ 2 + 2k

(2)

It should be taken into account that uncertainties on real data will be in many cases correlated and will not follow a normal distribution, and therefore eq 2 should be considered as just a practical tool for the purging of gross overfitting effects. It was clear from the beginning that the protocol could be easily extended to a combined use of other NMR observables such as quantitative NOE-derived distances,34 scalar couplings, or even isotropic chemical shifts, as will illustrated here. The χ2 term in eq 2 can then be extended as χ2 =

⎛ d exp − d calc ⎞2 ⎛ Dexp − Dcalc ⎞2 i i i ⎜⎜ i ⎟⎟ ⎟ + ∑ ⎟ D d σ σi ⎝ ⎠ ⎝ ⎠ i

∑ ⎜⎜

⎛ J exp − J calc ⎞2 ⎛ δ exp − δ calc ⎞2 i i i i ⎜ ⎟ ⎟⎟ +∑ ⎜ ⎟ +∑ ⎜⎜ σi J σiδ ⎝ ⎠ ⎝ ⎠

(3)

Note that since square residues in eq 3 are weighted by σ2 uncertainties, the different terms become dimensionless, allowing a natural mixing of the different observables as far as the σ uncertainties can be fairly estimated. No doubt one of the reasons for the popularity of Goodman’s DP4 method is the scoring of the different configurations through a simple probability number computed through Bayesian statistics. Very interestingly, the use of the AIC allows scoring the different trial models in terms of probability ratios. The relative goodness of two different models, e.g., two particular conformational ensembles in the present case, is given by the difference in AIC scoring between them. The relative probability, with respect to the minimum AIC solution, that a particular ith model minimized the information loss is given by

pi pmin

= e−

AICi − AICmin 2.0

(4)

The expression above strictly holds in the case of normally distributed and known standard errors σ. In practical applications, the so-obtained probabilities must be considered as approximate values but nevertheless useful for practical purposes, in this respect not very different from the use of DP4 and related probabilistic scores. B

DOI: 10.1021/acs.jnatprod.7b00926 J. Nat. Prod. XXXX, XXX, XXX−XXX

Journal of Natural Products

Article

A very agnostic approach was employed here for the determination of the σD uncertainty by doing an SVD39 fitting of reported RDC data to a molecular mechanics MMFF94 structure of the strychnine lowest conformation.40 A 1.2 Hz σD standard error was estimated from comparison of experimental with back-computed values. In the case of 13C chemical shifts the experimental values can be measured with high accuracy, but model deficiencies can be present involving either inaccuracies in molecular geometries or the DFT chemical shielding prediction. Additionally, the computed chemical shieldings must be transformed into shifts by the use of a proper reference. Different methodologies have been proposed, but the most agnostic approach is just to use a linear relationship between computed shieldings and predicted chemical shifts. That is

Hence, the algorithm here proposed is based on the generation of combinations of ensembles of m conformers from an initial pool of n conformers as determined by a molecular mechanics search. To avoid astronomical computation times, the generation of ensembles is stopped when the AIC no longer decreases as the number of conformations is increased. The procedure is repeated for all trial configurations, and finally the configuration having the lowest AIC scoring ensemble is reported as the best solution to the structural problem. It will be shown here how the reported CASE-3D protocol can be easily extended to the use of other NMR observables, principally 13C chemical shifts, either alone or in combination with other data such as 1H chemical shits, 1DCH RDCs, and 3 JHH couplings. The performance of the method is first illustrated with two polycyclic products of well-known structure, artemisinin (1) and homodimericin A (2) (Figure 1), and finally, using the present algorithm, the structures of

δ = aσcalc + b

(5)

Hence, the slope a and intercept b are determined by linearly fitting computed shieldings to the experimental values.41 In principle, these linear scaling parameters could be fit simultaneously with conformational amplitudes. However, the presence of extra free parameters certainly lowers the discriminating power. Alternatively, a and b can be determined a priori by previous fitting assigned 13C spectra to molecules of known structure. The last approach was followed in the present work, and it was found that the discriminating power was rather sensitive to the a and b values. In fact, the best fittings and discriminating power were obtained with a and b values determined from fitting 13C chemical shift data from the hydrocarbon β-pinene.42 Here, a and b were determined to be −1.06 and 197.5, respectively. The standard error σδ was then estimated from a B3LYP/6-31G*//MMFF94 DFT shielding computation on strychnine and comparison with the experimental shifts using the proposed a and b scaling values. A value of 2.0 ppm was obtained. In the case of 1H chemical shifts the values a = −1.029 and b = 33.00 were determined just from a fitting of strychnine B3LYP/6-31G*//MMFF94 shieldings to published experimental data.43 The associated standard error was 0.18 ppm. For 3JHH couplings, a standard error σJ was estimated to be 0.5 Hz by fitting reported data44 to the MMFF94 strychnine structure using the Altona equation.45 Having all the necessary requirements ready to use in the present CASE-3D protocol, four different natural products, with a large variety of structural motifs, were tested. The first one is artemisinin (1, Figure 1), a well-known antimalarial agent for which the discovery led to a Nobel prize.46 The second one is homodimericin A (2, Figure 1), recent target of a structural elucidation study based on the combined use of CASE software, RDCs, and RCSAs.5 Finally, a problematic structure elucidation case was addressed, that of the natural products curcusones I and J (putative structure 3 and 4, Figure 6), where the NMR data from recent synthetic studies47 did not agree with the previously proposed structure for the isolated compounds.35 Artemisinin (1). 1D 1H NMR, 1D {1H}13C NMR, 1H,1H− COSY, 1H,13C-HSQC, and 1H,13C-HMBC data obtained on a commercial sample of artemisinin 1 were fed into the MNova structural elucidation software.48 The first scored structure, after the 13C empirical chemical shift prediction, was the reported natural product constitution shown in Figure 1. The 2D SDF file was then fed into the previously published Python program,28 which encapsulates the diastereoisomer generation and conformational search steps. Additionally, new

Figure 1. Artemisinin (1) and homodimericin A (2) structures.

curcusones I and J (3 and 4, Figure 6) were revised on the basis of the analysis of recently published data.35 Additionally, it will be illustrated how unassigned 13C chemical shifts can be easily employed for the revision of published structures.



RESULTS AND DISCUSSION General Considerations. A problem when mixing data of different nature through eq 3 is how to establish an appropriate relative weight in the fitting procedure. Since the uncertainties associated with a single datum σi can be usually very difficult to estimate, it is convenient in practical terms to use general standard errors σD, σd, σ J, or σδ associated with each type of observable. In the case of RDCs, uncertainties may arise from different sources. Determination of D splittings might suffer from strongcoupling effects. This mostly affects individual 1DCH couplings in prochiral methylene groups.36 By using F1-coupled HSQC experiments strong-coupling effects are largely avoided, since only the sum of the coupling is measured, which is insensitive to the coupling between the germinal protons.37,38 The loss of information by no longer having each individual coupling is compensated for by the fact that assignment of the prochiral protons is not needed, greatly simplifying the stereochemical analysis. Other sources of errors come from the molecular modeling. First, static structures are used in the fitting, and therefore vibrational effects are neglected. Additionally, errors can arise from inaccurate geometries, although in our experience this is not an important factor, even, as in this work, when computationally cheap molecular mechanics geometries are employed. C

DOI: 10.1021/acs.jnatprod.7b00926 J. Nat. Prod. XXXX, XXX, XXX−XXX

Journal of Natural Products

Article

Figure 2. Four best AIC scores for artemisinin (1) diastereoisomeric forms. Top left: 13C unassigned data. Top right: assigned 13C data. Left bottom: 1 DCH RDCs. Right bottom: 13C assigned data and 1DCH RDCs. Configurations named after the centers C1, C4, C5, C6, C7, C10, and C11. See Tables S1−S4 (Supporting Information) for the full data.

scripts were developed to automatically launch the DFT chemical shift computations. Due to the geometrical restrictions imposed by the peroxide ring, only 32 diastereoisomers from the expected 64 were obtained. A list of 13C chemical shifts obtained through automatic peak-picking in MNova was then submitted to StereoFitter, a recent C++ implementation of our AIC-based algorithm, which shows a great improvement over the published Python code28 in terms of speed. When using 13C chemical shifts, StereoFitter can be run with assigned or unassigned data sets. In the later case, the experimental chemical shifts were assigned to each particular carbon by following the predicted order from the chemical shielding computations. Fitting unassigned chemical shifts resulted in moderate discrimination of the relative configuration of artemisinin (1), with AIC differences ΔAIC = AIC − AICmin of 13−14 over the closest scoring forms. This corresponded to relative probabilities, as defined in eq 4, over the preferred model of 0.1% or less. A higher discrimination (ΔAIC > 25, relative probability