Quantitative Assessment of Molecular Dynamics Sampling for Flexible

Jan 13, 2017 - The main reason for redesigning the clustering procedure was that we ran into memory limitations when we tried to perform the full clus...
0 downloads 12 Views 4MB Size
Article pubs.acs.org/JCTC

Quantitative Assessment of Molecular Dynamics Sampling for Flexible Systems Mike Nemec and Daniel Hoffmann* Bioinformatics and Computational Biophysics, Center for Medical Biotechnology, University of Duisburg−Essen, Essen D-45117, Germany S Supporting Information *

ABSTRACT: Molecular dynamics (MD) simulation is a natural method for the study of flexible molecules but at the same time is limited by the large size of the conformational space of these molecules. We ask by how much the MD sampling quality for flexible molecules can be improved by two means: the use of diverse sets of trajectories starting from different initial conformations to detect deviations between samples and sampling with enhanced methods such as accelerated MD (aMD) or scaled MD (sMD) that distort the energy landscape in controlled ways. To this end, we test the effects of these approaches on MD simulations of two flexible biomolecules in aqueous solution, Met-Enkephalin (5 amino acids) and HIV-1 gp120 V3 (a cycle of 35 amino acids). We assess the convergence of the sampling quantitatively with known, extensive measures of cluster number Nc and cluster distribution entropy Sc and with two new quantities, conformational overlap Oconf and density overlap Odens, both conveniently ranging from 0 to 1. These new overlap measures quantify self-consistency of sampling in multitrajectory MD experiments, a necessary condition for converged sampling. A comprehensive assessment of sampling quality of MD experiments identifies the combination of diverse trajectory sets and aMD as the most efficient approach among those tested. However, analysis of Odens between conventional and aMD trajectories also reveals that we have not completely corrected aMD sampling for the distorted energy landscape. Moreover, for V3, the courses of Nc and Odens indicate that much higher resources than those generally invested today will probably be needed to achieve convergence. The comparative analysis also shows that conventional MD simulations with insufficient sampling can be easily misinterpreted as being converged.

1. INTRODUCTION Conformational flexibility is fundamental to biomolecular function,1 and conceptual models of biomolecular interaction, such as induced fit2 and conformational selection,3 are built around conformational flexibility. Naturally, Molecular Dynamics (MD), a method designed to computationally simulate conformational flexibility with high resolution, has become a widely used instrument for the exploration of biomolecules.4 However, the enormous complexity of the conformational space and energy landscape of biomolecules severely limits the scope of MD. Numerous methods have been devised to shift this frontier to better sampling of ever larger systems (for reviews, see, e.g., ref 5). In this work we test two approaches for enhanced sampling. The first is strikingly simple and effective: the simulation of multiple trajectories from different initial conditions.6 The second approach (“enhanced MD”) levels the energy landscape to speed up the dynamics and thus to improve sampling. We use two implementations of enhanced MD, accelerated MD (aMD)7,8 and scaled MD (sMD),9 that either fill energy minima with a boost potential (aMD) or flatten the energy landscape with a scaling factor (sMD). The central question of this work is how can we quantify the efficiency of these and other approaches for the sampling of flexible molecules? Sometimes, it is possible to assess the quality of conformational sampling by direct comparison with experiment, as in protein folding simulations where results are © 2017 American Chemical Society

well-defined conformational clusters that can be directly compared to experimental structures10 (note that this also requires that the force field be sufficiently accurate11). The situation is less clear for molecules for which such information is not available. Examples are many proteins or complexes of as yet unknown structure or molecules that do not assume a welldefined structure, for instance, intrinsically disordered proteins.12 A recently published MD study of a peptide− membrane−water system has vividly illustrated the immense magnitude of computational effort necessary for sufficient sampling of such flexible systems.13 A comment on that study pointed out that the problem can easily evade recognition by standard methods for the assessment of sampling quality.14 However, the situation is probably worse because a sizable portion of published MD studies do not report even the use of those established methods. This is unfortunate because many of these methods are well-documented and software implementations are freely available,15,16 including, for instance, block averaging and PCA based methods,17 the cosine content,18 the decorrelation time,19 and effective sample size.20 In this work, we test a comprehensive approach to assess the sampling quality for a specific scenario, namely, the use of multiple Received: August 19, 2016 Published: January 13, 2017 400

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation

improved sampling within a set of discovered clusters will only change Sc. Thus, a set of quantities could be used to draw a comprehensive picture that also allows checking for consistency. For instance, a necessary condition for good sampling in a multitrajectory experiment includes high values of Odens (which implies high Oconf), together with convergence of Nc.

trajectories and enhanced MD, a promising combination to improve sampling of flexible systems. We have studied two such flexible systems. First, the pentapeptide Met-Enkephalin21 is a highly flexible signaling molecule that interacts with specific opioid receptors. The combination of small size and complex conformational space has made Met-Enkephalin a popular system for benchmarking molecular sampling methods.22 The second system is the third variable loop (“V3”) of the Human Immunodeficiency Virus 1 (HIV-1) envelope protein gp120. This loop of 35 amino acid residues determines which cells are eventually infected by HIV1 by interacting with cell-specific HIV-1 co-receptors CCR5 or CXCR4.23 However, the notorious flexibility of V3 makes it difficult, if not meaningless, to determine its precise 3D structure (in fact, the available experimental structures differ strongly24,25). Nevertheless, it is a meaningful question for both flexible molecules, Met-Enkephalin and V3, which conformations are sampled because this will determine the specific interactions with their receptors. For these two molecules, we comparatively assess the quality of sampling obtained with multiple trajectories generated by conventional and enhanced MD. For a comprehensive picture, we study several aspects with a set of quantities that will be described in detail in the Molecules and Methods section: 1. Do different trajectories cover the same conformational space? This is assessed by the conformational overlap Oconf, conveniently ranging from 0 (no overlap) to 1 (complete overlap). 2. Do different trajectories produce the same conformational probability distribution? The degree of fulfillment of this more stringent criterion is quantified by the density overlap Odens, also ranging from 0 to 1. Because the enhanced MD methods deliberately distort the energy landscape and thus the sampling, we have to apply a reweighting correction for a proper evaluation of Odens. 3. Complementary to measuring Odens is the evaluation of the cluster distribution entropy Sc;26 two trajectories with perfect density overlap (Odens = 1) should have the same cluster distribution entropies. On the other hand, having similar cluster distribution entropies in two trajectories does not imply a high Odens because similar cluster distribution entropies can be generated in different regions of the conformational space (i.e., at low Odens). Importantly, an increase of the cluster distribution entropy is an indicator for improvements in sampling, either by convergence of the sample to the true conformational probability distribution or by sampling of new conformational clusters. 4. The question of “unknown unknowns” is certainly the most difficult one:14 are parts of the conformational space still missed by the sample? However, we can at least answer the weaker question: Is there evidence that we are still in the process of discovering new areas in conformational space? An indicator of this is the convergence of conformational cluster count Nc.27 Although these four quantities emphasize different aspects of sampling, they are not completely independent. For instance, a complete density overlap between different trajectories (Odens = 1) requires a complete conformational overlap between these trajectories (Oconf = 1). Or, ongoing discovery of new conformational clusters will increase both Sc and Nc, while

2. MOLECULES AND METHODS 2.1. Biomolecules. Met-Enkephalin. The initial structures for the pentapeptide Met-Enkephalin were taken from the two NMR model ensembles in Protein Databank entries 1PLW and 1PLX.28 We selected the two structures, here called Met79 and Met153, that had the largest root-mean-square deviation after optimal superposition (top of Figure 1). The termini were ̈ capped with ACE and NME moieties.

Figure 1. Simulated molecules Met-Enkephalin (top) and V3 (bottom) with structures and amino acid sequences. For each of the two molecules, two initial structures based on experimental data were used, Met79 and Met153 for Met-Enkephalin and V3a and V3b for V3.

V3. We selected a V3-loop sequence of an R5-tropic HIV-1 strain from the Los Alamos HIV database (GenBank entry AF112548). Candidates for the MD starting structures were modeled with this sequence and several crystal structures as templates (PDB entries 2B4C,24 2QAD,25 and 1CE429) using Blastp30 alignments and the automodel function of Modeler v9.13.31 For starting structure V3a, we used 2QAD as the template, and for starting structure V3b, we performed a multitemplate modeling with all three templates. In each of the two modeling runs, the highest scoring model out of five candidates was selected as a starting structure (structures V3a and V3b in Figure 1). The terminal cysteins were linked with a disulfide bridge, and the termini were capped with ACE and NME groups. Histidines were protonated on Nϵ2. 2.2. MD Simulations. The dynamics of Met-Enkephalin and V3 was simulated on GPUs with the Amber 14 software32 and the ff99SB-ILDN force field.33 Hydrogens were added to the experimental structures with the Amber program TLEAP. 401

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation E P = ⟨E P,cMD⟩ + αP

First, the energy of the peptides was minimized by 15000 steps of steepest descent and 15000 steps of conjugate gradient minimization in vacuum. Periodic boundary conditions were applied with a truncated octahedron and a distance between the molecule and box boundary of 1.1 nm. The peptide in the box was supplemented with TIP3P water molecules34 and, in the case of V3, three chlorine ions to neutralize the system. Then, the system was energetically minimized with position-restrained peptide atoms, followed by minimization with released atoms with the same numbers of steps as above. A 1 nm cutoff was applied to nonbonded interactions, and long-range electrostatics were computed with the Particle Mesh Ewald method.35 Lengths of bonds involving hydrogen atoms were constrained with the SHAKE algorithm.36 The water was relaxed with position-constrained peptide heavy atoms by a 1 ns NVT simulation in which the system was heated up over 1 ns from 0 to 300 K with the Langevin thermostat option. Simulations were then continued at 300 K (Langevin thermostat) and 1 atm (Berendsen barostat) in the NPT ensemble with a 2 fs integration time step and a collision frequency of 2 ps−1. This was followed by another pair of 1 ns NVT and NPT simulations without position constraints. The complete system was then finally energetically minimized with the same numbers of steps as above and heated up over 1 ns from 0 to 300 K and simulated (NPT) at this temperature without constraints but, as before, under the control of a Langevin thermostat and Berendsen barostat. This NPT run was continued as a production simulation. A representative set of Amber input files for energy minimization, heating, equilibration, and production is given as a zip archive in the Supporting Information. Production simulations were performed with three simulation methods, conventional MD (cMD), sMD, and aMD. Snapshots were stored every 10 ps. For each of the six combinations of the two starting structures and each of the three methods (cMD, aMD, sMD), we generated several trajectories. For Met-Enkephalin, we simulated for each combination 1 × 100 and 4 × 200 ns, along with 3 × 1 000 ns for cMD and aMD and 3 × 500 ns for sMD, that is, a total of 48 trajectories. For V3, we simulated for each combination 3 × 100 and 7 × 200 ns, that is, a total of 60 trajectories. cMD production simulations were just continuations of the conventional NPT simulations for up to 100, 200, or 1000 ns. In aMD simulations, boost potentials ΔV(r) were applied to lift potential energies below certain thresholds Eb and thus to ease conformational transitions.8,37 Hence, simulations were performed with boosted potentials V*(r) instead of the normal force field V(r)

E D = ⟨E D,cMD⟩ + 5αD

αD =

4 Nresidues 5

(3)

(4)

with ⟨E.,cMD⟩ as the averaged energies from corresponding cMD simulations and Natoms and Nresidues as the numbers of atoms and residues. In sMD simulations, the force field potential was scaled down by a factor of λ = 0.79 to a scaled potential V*(r): V *(r) = λV (r)

(5)

2.3. Overlap Measures Oconf, Odens. With the overlap measures Oconf and Odens, we compare the sampling between different trajectories of the same molecule. For instance, we could call sampling with two independent trajectories ideal if they both produce, within a small statistical error, the same equilibrium probability distribution p(r) that covers the complete accessible conformational space of that molecule (Figure 2). For a detailed assessment, we can split up this requirement into three criteria:

Figure 2. Overlap between trajectories for ideal and poor sampling (top and bottom, respectively). The two independent MD trajectories traj1 and traj2 of equal lengths sample a potential energy landscape (left column) well or poorly, respectively, leading to the same or to different estimates of conformation probability (middle column). Ideally, we have complete overlap between the trajectories, which means that each conformation in traj1 has the same number of geometric neighbors in both traj1 and traj2, and the same is true for each conformation in traj2 (top right). Conversely, if sampling in independent trajectories traj1 and traj2 is poor, overlap will usually be low (bottom right).

1. The independent trajectories cover the same conformational space. 2. They cover it with the same probability p(r). 3. The covered space is the complete accessible space. We quantify fulfillment of the first criterion with the conformational overlap Oconf that lies between 0 (different trajectories cover completely different conformational regions) and 1 (different trajectories cover the same region). Fulfillment of the second criterion is measured with the density overlap Odens, also lying between 0 (completely different probability p(r) inferred from independent trajectories) and 1 (p(r) the same for all independent trajectories). A necessary condition for the fulfillment of the third criterion, that is, completeness of sampling, is that the number of conformational clusters or the cluster distribution entropy be

V *(r) = V (r) + ΔV (r) = V (r) + ΔVP(r) + ΔVD(r) (1)

with a combination of two boost potentials ΔVD(r) for dihedral energy terms and ΔVP(r) for total potential energy ⎧0 for V.(r) ≥ E. ⎪ 2 ⎨ ΔV.(r) = (E. − V.(r)) ⎪ for V.(r) < E. ⎩ α. + (E. − V.(r))

αP = 0.16Natoms

(2)

with one equation for · = D and one for · = P. Separate thresholds EP, ED were applied to the total potential and the dihedral potential, respectively, as given in ref 38 402

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation

Here, er,κl is the number of events a conformation κ (reference frame) collects from trajectory l (l can be the trajectory that has generated κ or any other trajectory), nl is the number of frames of that trajectory, H(x) is the Heaviside step function, r is the neighborhood threshold parameter, RMSD(κ,il) is the RMSD (eq 6) between the reference frame κ and the ith frame of trajectory l, and wr,il is the weight of the ith frame of trajectory l. For convenience, we will sometimes merge the indexes il in wr,il to one, as in wr,λ. The weights wr,il are introduced to correct for the perturbations of the energy landscape in aMD or sMD simulations that lead to a relative overestimation of the frequency of higher potential energy conformations and thus to biased event numbers er. To eliminate this bias, we have to reweight the event numbers accordingly. The weights for the considered sampling methods are listed in the following, with superscripts indicating the sampling method cMD, aMD, or sMD. For a conformation j coming from a cMD trajectory l, we do not reweight, so that formally we have

converged. In practice, the condition is not sufficient because apparent convergence can also result from conformational trapping. A multitrajectory approach and enhanced sampling techniques could here make a difference. Although Oconf and Odens do not directly address the third criterion for ideal sampling, that is, complete coverage of conformational space, deviations from complete sampling can impact both overlap measures. For instance, if one trajectory misses large parts of the conformational space, a second independent trajectory with different initial conditions could explore those missed parts and thus have little overlap with the first trajectory. Thus, incomplete coverage of conformational space can lead to low overlap, and complete coverage can increase Oconf to 1. Of course, high overlap is no guarantee for complete sampling but may also be caused by trapping of both trajectories in the same local energy minimum. However, the more independent trajectories with different initial conditions are added, the less likely they are all trapped in the same minimum. Neighborhood Threshold r and Event Number e r. It is clear that two independent trajectories of finite length will rarely produce the numerically exact same conformation, even if they both explore the same energy minimum and thus should have high overlap. For a practically useful overlap definition, we therefore introduce a threshold r; two conformations are considered the same if they are closer to each other than r (if they are “r neighbors”). We measure the distance between two conformations j, k of the same molecule of J atoms with masses mi by their mass-weighted root-mean-square deviation after optimal superposition27,39 ⎛ ∑ J m |x − x |2 ⎞1/2 ik i = 1 i ij ⎟ RMSD(j , k) = ⎜⎜ J ⎟ ∑i = 1 mi ⎝ ⎠

wr(cMD) =1 , jl

For conformations j coming from an aMD trajectory l, we have implemented three reweighting variants with different numerical properties; exponential reweighting (Exp) considers each conformation jl with a reciprocal Boltzmann factor for its specific boost potential ΔVjl, a numerically more robust approach is the McLaurin expansion up to order m (McL),38,42 and the simple approach used in this article is a mean-field approximation (MF) ⎧ exp( +β ΔVjl) Exp ⎪ m ⎪ βi ⎪ McL ΔV jli ∑ ⎨ wr(aMD) = , jl i ! ⎪ i=0 ⎪ ⎪ exp( +β ΔV r(,njl)) MF ⎩

(6)

with positions xij, xik of atom i in conformations j, k. For the evaluation of the RMSD, we imply that the molecular structures j, k have been optimally superimposed.40 In practice, we optimally superimposed structures j, k to a reference structure l, such as the initial structure of a trajectory, and then computed RMSD(j,k). Except for theoretically conceivable but practically irrelevant ill-conditioned structures, this procedure is mathematically equivalent (see Supporting Information Figures 13 and 14) to a direct superposition of j, k but technically advantageous. Reported RMSD values refer to positions of heavy atoms of the peptide backbone. For each pair of trajectories of the same molecule, RMSD matrices were generated with GROMACS v4.6.7.41 For given threshold r, we consider j, k as the same conformation if RMSD(j,k) ≤ r. The elementary operation necessary for evaluating the overlap measures is to count how often we see the same conformation in independent trajectories. We call each occurrence of RMSD(j,k) ≤ r an event, and determining overlaps therefore requires counting of the number er of events

∑ wr ,il ·H(r − RMSD(κ , il)) i=1

(10)

with thermodynamic temperature factor β, boost potential ΔVjl applied to frame j of trajectory l of nl frames, and McLaurin expansion order m. The nth iteration mean-field average of the boost potential is given by ΔV r(,njl+ 1) =

1 Nr , jl

nl

∑ ΔV r(,nil)·H(r − RMSD(i , j))

(11)

i=1

with ΔV(0) r,il = ΔVil the boost potential applied to conformation l il and Nr,jl = ∑ni=1 H(r − RMSD(i,j)) the number of frames of trajectory l in an r neighborhood of frame j. The MF weights depend on r, similar to the binning approach in ref 9. In practice, we have used ΔV(1) r,jl in the MF variant of eq 10, that is, the average of the boost potentials applied to conformations i in the r neighborhood of frame j. For a conformation j coming from an sMD trajectory l, we reweight with an average over its r neighborhood in l

nl

er , κl =

(9)

(7)

nl

1

wr(sMD) ≡ wr(,njl+ 1) = (∑ wr(,njl)·H(r − RMSD(i , j)))λ − 1 , jl

with

i=1

⎧ 1 (x > 0) H (x ) = ⎨ ⎩ 0 (x ≤ 0)

(12)

⎪ ⎪

with potential scaling factor λ, with w(sMD) = w(1) r,jl r,jl , that is

(8) 403

w(0) r,jl

= 1. In practice, we worked

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation λ) − 1 wr(sMD) = Nr(1/ , jl , jl

generate trajectory k. For error estimation (see the section “Overlap Errors”), it is advantageous to interpret Odens as the average of an event function fdens(k,L;r), defined in eq 17, over the set of reference trajectories K. fdens(k,L;r) quantifies how in the r neighborhood of each reference frame κ ∈ k the probability density (or event density) varies between trajectories l in the comparison set L. Each of the trajectories k ∈ K and l ∈ L is considered a statistical sample of the system. A necessary condition for convergence of each trajectory k, l is that each ratio of minimum and maximum ẽr,κl in eq 17 be approximately 1 because then each frame κ ∈ k should see approximately the same number of r neighbors in all l ∈ L. Odens will therefore be close to 1 for converged K, L. On the other hand, the minimum to maximum ratio can drop to 0 if κ sees largely varying r neighborhoods between different l, leading to a minimum Odens = 0 for unconverged trajectories. We apply Odens mainly to monitor the convergence of sampling in a set of trajectories (K = L). In another application (see section 3.7), we quantify the overlap between groups of concatenated trajectories (group overlap). A typical example of group overlap is the overlap of the complete conformational density sampled by the group of all cMD trajectories (i.e., concatenation of all cMD trajectories), with the complete conformational density sampled by the group of all aMD trajectories (i.e., concatenation of all aMD trajectories). To illustrate the case K = L for monitoring sampling convergence, imagine that we want to test whether each of two MD trajectories produces the same population density (perfect overlap). If they do, each frame κ sees the same density in its own and in the other trajectory, so that the minimum and maximum in the ratio in eq 17 are the same and the ratio is 1. Then, clearly Odens = 1. If, on the other hand, the overlap is very small, the minimum in the numerator (coming from the other trajectory) will be much smaller than the maximum in the denominator (coming from the same trajectory) and Odens ≈ 0. Note that in Odens (eq 17), each frame from an enhanced sampling trajectory is reweighted to correct for the population density distortion introduced by the boost potential or scaled potential. The reweighting occurs with wr,κ if κ is a frame from an enhanced sampling trajectory k, and it occurs with ẽr,κl if l is an enhanced sampling trajectory. In general, we have Odens ∈ [0,1], with 0 for no overlap and 1 for perfect overlap of conformational population densities. Averaged Overlaps. Perfect sampling should yield Oconf ≈ Odens ≈ 1 for all values of the neighborhood threshold r. Under these conditions, we should obtain for rmin = 0 and a reasonable maximum value rmax the averaged overlaps

(13)

Using any of the weights wr,il introduced above, we can define a normalized event number ẽr,κl that a reference frame κ observes in some trajectory l er , κl er̃ , κl = nl ∑i = 1 wr , il (14) ẽr,κl ∈ [0,1] is the (appropriately reweighted) fraction of trajectory l that falls into the r neighborhood of conformation κ. Conformational Overlap Oconf. We define the conformational overlap Oconf between a set K of trajectories (with the total number of nK frames) and a set L of trajectories for given neighborhood threshold parameter r as Oconf (K , L ; r ) =

1 nK

∑ δ(∏ er ,κl) κ∈K

l∈L

(15)

with ⎧ 1 for x ≠ 0 δ(x) = ⎨ ⎩ 0 for x = 0

(16)

The sum in eq 15 counts the number of reference frames κ ∈ K that have at least one r neighbor in each of the trajectories l ∈ L. Division by nK normalizes Oconf to the interval [0,1]. Intuitively, Oconf is a quantitative answer to the question: If we sit on each of the nK frames in K in turn and look into the set L of trajectories, how often do we see in each of the trajectories l ∈ L at least one frame in our r neighborhood? If each of the nK frames sees r neighbors in each l, the overlap Oconf(K,L;r) is 1. Then, K and L sample the same conformational space, though not necessarily with the same probabilities. This is a stringent generalization of a comparison of just two trajectories: If K and L are two single trajectories, Oconf = 1 if for each frame in K there is at least one r neighbor in L. Conversely, if for each frame in K there is no r neighbor in L, we have Oconf = 0, that is, the two trajectories sample completely disjunct parts of conformational space. Note that Oconf is not commutative. For instance, if K is a small sample far from convergence and L is a large, an exhaustive sample is then Oconf(K,L;r) = 1 but Oconf(L,K;r) ≈ 0; alternatively, using the above intuitive argument, each frame of the small sample K sees r neighbors in the exhaustive sample L, but not vice versa. We mainly used Oconf(L,L;r) to quantify the self-consistency of trajectory sets L. Moreover, we applied Oconf to compare trajectories or trajectory sets sampled with the same method (but perhaps different initial conformations) or with different methods (cMD, aMD, sMD). Density Overlap Odens. We have a perfect overlap between different trajectories if each frame sees in its conformational neighborhood in each trajectory the same conformational density. One way to implement this concept is the following density overlap Odens quantity Odens(K , L ; r ) =

1 nK

∑ k∈K

Ω.(K , L) =

∫r

rmax

min

O.(K , L ; r ) dr = 1

(18)

for both Oconf and Odens. In general, Ωconf, Ωdens ∈ [0,1]. We integrated the overlap measures numerically with Simpson’s rule over an interval of r values corresponding to a RMSD range that covered 99% of the respective RMSD distribution. Overlap Errors. Odens (eq 17) is formulated as an average of an appropriately defined event function fdens(k,L;r) over a set of reference trajectories K. We here interpret Oconf (eq 15) in a corresponding way, that is, as average of fconf(k,L;r) over a set of reference trajectories K. To estimate overlap uncertainty, we consider the distribution of the individual f.(k,L;r) values of the event functions and compute quantiles of this distribution. For instance, in line graphs of overlap results, we plot the first, second (median), and third quartile of the distribution of

min{e∼r , κl : l ∈ L} ∑κ ∈ k wr , κ κ ∈ k max{e∼r , κl : l ∈ L}    1

rmax

1 − rmin

∑ wr ,κ ·

≡ fdens (k , L ; r )

(17)

with K as a reference set of nK trajectories k with frames κ and L a comparison set of trajectories l. Here, ẽr,κl are normalized event numbers (eq 14). The weights wr,κ are those defined in eqs 9, 10, and 12, depending on the sampling scheme used to 404

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation

Figure 3. RMSD distributions for all forty-two 200 ns simulations (cMD, aMD, sMD) of Met-Enkephalin. (Left) RMSD values of all pairs of frames in each single trajectory (each curve corresponds to one trajectory). (Center) Distribution of RMSD values between pairs of frames from different trajectories. (Right) Averages of left (blue), center (red), and all (black); red vertical lines enclose 99% of the probability. Nc

f.(k,L;r) values. Errors of averaged overlaps were estimated from the integrals over all first quartiles (lower error bar) and third quartiles (upper error bar). 2.4. Cluster Number and Cluster Distribution Entropy. For the computation of a necessary criterion of sampling completeness, we monitored the development of cluster number Nc27 and cluster distribution entropy Sc26 in the set of trajectories. In the following, the term clustering signifies a complete partitioning of the sampled conformational space at an approximately homogeneous resolution. As resolution parameter we used again the neighborhood threshold r introduced in the previous section on overlap measures, though here we did not look at the r neighborhoods of all sampled conformations, but we constructed a contiguous partitioning of the sampled conformational space in nonr overlapping chunks of RMSD radius R with 2 ≲ R ≤ r . In the following, we outline the clustering algorithm. Initially, we consider all sampled frames as potential cluster centroids. We determine the first actual cluster centroid C1 as the sampled conformation that has the lowest RMSD to a given reference structure, for example, to one of those in Figure 1. The second centroid, C2, is the frame outside of the r neighborhood of C1 that is closest to C1. Frames in the r neighborhoods of C1 or C2 are discarded from the list of potential cluster centroids. Centroids Ci (i = 3, 4, ..., Nc) are then determined by iterating over two steps: (1) we compute an auxiliary center Ai as the coordinate average of Ci−1 and Ai−1 (in the first iteration: A2 = C1), and then determine as Ci the frame in the remaining list of potential centroids that is closest to Ai; (2) frames in the r neighborhood of Ci are discarded from the list of potential centroids. The iteration stops when the list of potential centroids is empty. After all Nc cluster centroids have been determined in this way, the clustering is completed by assigning each sampled frame to its closest centroid. The main reason for redesigning the clustering procedure was that we ran into memory limitations when we tried to perform the full clustering of multiple long trajectories at high resolution. The software implementing the above clustering procedure has been optimized so that it can process large RMSD matrices with moderately sized memory and CPU time. For the range of trajectory sizes and numbers that could be clustered also with established approaches (hierarchical clustering with complete linkage43,44 and pamk45,46), we found that the behavior of the cluster-based quantities Nc and Sc were robust with respect to the clustering method (not shown). The cluster distribution entropy Sc was computed with

Sc = −∑ pi log pi i=1

(19)

where pi is the number of frames assigned to cluster i, divided by the total number of frames. We characterized changes of Nc or Sc by linear least-squares regression over the last time interval of an appropriately chosen size Δt. The slopes of the linear models and their 95% confidence intervals are reported in the Supporting Information Figures 1−4. For the evaluation of probabilities pi in eq 19, the same reweighting was performed for aMD and sMD as that in Odens with MF correction and r = 0.11 nm (Met-Enkephalin) and r = 0.35 nm (V3). A software package written in Python that implements computation of Oconf, Odens, Nc, and Sc for multitrajectory experiments with cMD, aMD, and sMD is freely available as source code at https://github.com/MikeN12/ PySamplingQuality.

3. RESULTS AND DISCUSSION The main question that we study here is how does the sampling quality for flexible biomolecules change as we go from a single trajectory to multiple trajectories and from conventional to enhanced MD sampling (aMD, sMD)? We will give a comprehensive answer to this question by reporting Oconf, Odens, Nc, and Sc for different simulation conditions for the smaller Met-Enkephalin and for the larger V3-loop. If not explicitly mentioned otherwise, all results are based on sets of seven 200 ns trajectories for each combination of the two starting structures (Met79, Met153 and V3a, V3b) and the three methods (cMD, aMD, sMD), that is, two sets of 42 trajectories, one for Met-Enkephalin and one for V3. 3.1. RMSD Distributions. The overlap and cluster measures were computed from RMSD matrices of single or multiple trajectories. It is already instructive to inspect the distribution of RMSD values in these matrices directly. For instance, two independent trajectories sampling the same free energy minimum would lead to a monomodal RMSD probability distribution, whereas two independent trajectories sampling distinct free energy minima would typically result in a bimodal RMSD probability distribution. For the Met-Enkephalin experiments, the RMSD distributions of single 200 ns trajectories and pairs of these trajectories are basically the same monomodal distributions (Figure 3). For all sampling protocols and irrespective of single or multiple trajectory, the distribution of RMSD values has the same bell shape in the range of about 0.01−0.45 nm, with a maximum probability at 0.17 nm. This could be explained by a good sampling of Met-Enkephalin with single 200 ns cMDtrajectories that is not improved by multiple trajectories or enhanced sampling methods. 405

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation

Figure 4. RMSD distributions for all forty-two 200 ns simulations (cMD, aMD, sMD) of V3. (Left) RMSD values of all pairs of frames in each single trajectory (each curve corresponds to one trajectory) for trajectories generated by cMD (first row), aMD (second row), and sMD (third row); red and black curves correspond to different starting conformations. (Middle) Distributions of RMSD values between pairs of frames from different trajectories, one panel for each sampling method. (Right) Averages as in the right panel of Figure 3.

Figure 5. Cluster number as a function of neighborhood threshold r for Met-Enkephalin (left four panels) and V3 (right four panels) in log−log plots. Boxes with whiskers show the maximum (upper whisker), third quartile (upper edge of box), median (central line), first quartile (lower edge), and minimum (lower whisker) of the cluster number Nc. Lines are fits of Nc(r) = αr−β with fitted parameters α, β. Numbers in the plot are exponents β and their 95% confidence intervals. For each combination of starting conformation and sampling method, seven 200 ns trajectories were evaluated. (a) Clustering of all trajectories of the respective molecule. (b−d) Clustering of cMD, aMD, and sMD trajectories, respectively, from different starting conformations. Colors red and black indicate starting conformations.

RMSD ranges (left column of Figure 4), the ranges seen with multiple trajectories are shifted by about 0.2 nm to higher values (middle column of Figure 4). The right panel of Figure 4 makes the benefit of the multiple trajectory approach for V3 more clearly visible; the average RMSD shifts from about 0.5 to about 0.8 nm if we combine multiple trajectories. 3.2. Cluster Numbers as a Function of Neighborhood Threshold r. Because Oconf, Odens, Nc, and Sc have the neighborhood threshold r as a parameter, we asked next whether there is an optimal choice of r. For instance, the RMSD distributions in Figures 3 and 4 suggest that for Met-

For V3, the picture is completely different. Of course, all RMSD distributions are shifted to the right because V3 is much larger than Met-Enkephalin. However, the most remarkable difference is that we have a highly diverse set of irregularly shaped, in general, multimodal, RMSD distributions. In comparison with cMD, distributions from aMD and sMD runs are relatively weaker at low RMSDs and stronger at higher RMSDs, as expected from sampling methods that drive the system out of local minima. Even with cMD, the combination of different trajectories has a big impact on the range of explored RMSD values. In comparison to the single-trajectory 406

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation

Figure 6. Overlap measures Oconf (left column) and Odens (center) and averages Ωconf and Ωdens (right column) for trajectories of Met-Enkephalin (blue traces and axis labels) and V3 (red traces and axis labels). (Top row) For each method, cMD, aMD, and sMD, 14 trajectories of 200 ns length were evaluated (7 for each of the two starting conformations); trace “ALL” combines all forty-two 200 ns trajectories of a molecule. (Bottom row) Same set of 200 ns trajectories, split according to starting structures (X = Met79 or V3a, Y = Met153 or V3b). Error bars as described in Molecules and Methods section “Overlap Errors”.

Enkephalin at r ≳ 0.17 nm and for V3 at r ≳ 0.8 nm the most resolution is lost and the overlap measures will trivially approach a value of 1, even if sampling is bad. One way of determining an optimal r would be to observe the number of clusters as a function of r and then choose a value of r where a marked change of cluster number occurs. Figure 5 shows that for the two flexible molecules studied here, there is no good candidate for an optimal r because both systems show the power-law distribution that is the hallmark of a scale-free system. This is the behavior expected for clustering the same random walk at different resolutions.47 For Met-Enkephalin, we have a very good fit of the cluster number Nc(r) to a power-law αrβ with an exponent β at around −4.7, with only one outlier at r = 0.15 nm. The fits for trajectories with different starting conformations are visually not distinguishable (Figure 5, left quartet). For V3, the fit is still good, but here, the spread of cluster numbers is much larger and differences between trajectory sets starting from different conformations lead to deviating fits, though the exponents are not significantly different. Neither for Met-Enkephalin nor for V3 is there a feature in the Nc(r) plots that would clearly mark an optimal r. A notable feature of the V3 cluster number is that combined clustering of all 42 trajectories of 200 ns length (Figure 5a of right quartet) leads to a strong increase of the number of clusters compared to smaller groups of seven 200 ns trajectories. This means that the different groups of trajectories explore different clusters or different parts of conformational space, probably with little overlap between groups, as analyzed in the next section. With all trajectories combined, the exponent β ≈ −4.26 for V3 approaches the one of MetEnkephalin. 3.3. Overlap Measures. High overlap within a diverse set of trajectories is a necessary condition for converged sampling.

This implies for instance that trajectories starting from different initial structures should sample the same conformational regions with the same densities. In the following, we study therefore the overlaps for groups L of trajectories, that is, Oconf(L,L;r) and Odens(L,L;r). Either such a group is given by all trajectories sampled with a specific method (e.g., all fourteen 200 ns trajectories sampled with cMD), or a group is given by all trajectories sampled with a specific method and starting from one of the two starting structures (e.g., all seven 200 ns cMD trajectories starting from Met79). For Met-Enkephalin, the conformational overlap Oconf is 1 for neighborhood thresholds as low as r = 0.1 nm (blue traces in the upper left of Figure 6), irrespective of the sampling method. Accordingly, the corresponding Ωconf values (blue stars in the upper right of Figure 6) are also close to 1. This means that at resolution r ≳ 0.1 nm the conformational space available to the ensemble of trajectories is completely covered by each single trajectory. The density overlap Odens converges much more slowly to 1 with increasing r (blue traces in the top central panel of Figure 6). At r ≈ 0.1 nm, we have Odens ≈ 0.2, and at r ≈ 0.17 nm, where the resolution is already very coarse according to Figure 3, we reach Odens ≈ 0.5−0.6. Thus, we are far away from reaching a converged density at an acceptable resolution. The corresponding Ωdens values between 0.6 and 0.7 (blue bullets in the right panel of Figure 6) have to be interpreted as not satisfactory. Nevertheless, the sampling method had a noticeable impact on Odens and Ωdens. In particular, aMD performed consistently better. At low values of r, there is an interestingly different course of Odens of aMD on one hand and of cMD and sMD on the other; for r ≲ 0.1 nm, Odens for aMD trajectories collapses to 0, while sMD and cMD still have Odens ≈ 0.15. This difference is probably due to the tendency of aMD to systematically destabilize the many small local minima at this fine resolution that then are not visited in 407

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation

Figure 7. Overlap measures Oconf (top row) and Odens (bottom row) for fixed r as a function of simulated time. (Left) For each pair of MetEnkephalin starting structure (Met79 and Met153) and method (cMD, aMD, sMD), seven trajectories were evaluated at times ≤200 ns and three at 500 ns for neighborhood threshold r = 0.11 nm. (Middle) For each pair of V3 starting structure (V3a and V3b) and method (cMD, aMD, sMD), seven trajectories were evaluated at r = 0.35 nm. (Left) Same as the middle column but with r = 0.7 nm. Error bars are as described in Molecules and Methods section “Overlap Errors”.

different parts of this space with little overlap. For the smaller Met-Enkephalin, this explanation is less likely true; here, we might see the effect of a bias of the enhanced sampling methods (see also section 3.7). For the planning and assessment of MD experiments, it is important to estimate the time necessary for convergence of trajectories. The conformational space of Met-Enkephalin can be sampled reasonably well at a resolution of r = 0.11 nm in a time of about 100 ns that is typical of current MD simulations (top left of Figure 7). However, to see an approximately converged Odens at a good resolution, we need sets of simulations starting from different conformations and with lengths of the order of 1 μs (bottom left of Figure 7). This sampling effort is still in the purview of typical concurrent MD studies, though surprisingly large for systems of this small size. The power-laws of a number of clusters as a function of resolution r (Figure 5) suggest that higher resolution will require a strongly increasing sampling effort to visit all clusters with about the same relative frequency in all trajectories of a set. In fact, we find for Met-Enkephalin that the simulation time required to achieve convergence of Oconf as a function of r again follows approximately a power-law (Supporting Information Figure 7). This means that for a coarser resolution r, less simulation time is needed to achieve a given value Oconf (or Odens). However, it is not advisable to draw from this the conclusion to choose a very large value of r in order to achieve a high overlap with relatively short simulations because short simulations usually will lead to inaccurate estimates of observables (for an example, see Supporting Information Figure 15). It is useful to think about r as a spatial resolution parameter and to ask what spatial resolution is acceptable for the observable of interest. A much larger investment is needed for a V3 (middle and right columns of Figure 7). The conformational space cannot be charted with a similar simulation time as Met-Enkephalin. In fact, at the three times lower resolution of r = 0.35 nm, the conformational overlap is low or even decreasing in the first 200 ns with increasing simulation time if trajectories in L have the same initial structures. Thus, at this resolution, we are sampling more new conformations with increasing simulation

other trajectories. Conversely, sMD (for the chosen scaling parameter λ) and especially cMD may explore conformational space more locally. This is consistent with the stronger effect of the starting structures on Odens and Ωdens (bottom center and right panels of Figure 6) for sMD and especially cMD. For V3 (red traces in Figure 6), the picture is qualitatively different from Met-Enkephalin. The conformational space of V3 is so vast that we have to choose r ≳ 0.8 nm for any sampling method to come close to a Oconf of 1. Remember that according to Figure 4 we have at r ≈ 0.8 nm a resolution that is so coarse that it is almost useless. In comparison to MetEnkephalin, Ωconf is almost halved for V3. For Odens, both enhanced sampling methods are consistently better than cMD, but for all of them, Odens < 0.2 at r ≈ 0.8 nm. The same message is conveyed by the relatively low values of Ωdens. The mild dependency of the cMD sampling on the starting structure observed for Met-Enkephalin becomes massive for V3 (bottom row of Figure 6). The highest Oconf and Odens values are reached for cMD with V3a as the starting structure, and the lowest Oconf and Odens values for cMD have V3b as the starting structure (bottom left panel of Figure 6). It is important to note that this does not mean that the cMD sampling starting from V3a is particularly good. On the contrary, as will become clear below when we study changes in cluster number changes, the set of cMD trajectories starting from V3a is trapped in a limited set of clusters and explores this local set more intensively. We have also lumped all forty-two 200 ns trajectories of each of the two molecules, sampled with all three methods from both starting conformations in one large group, and then computed Oconf(L,L;r) and Odens(L,L;r) of this large group (“ALL” in Figure 6). If the different methods sample basically the same conformational space with the same density, the overlap of this large group should not differ much from that of the individual groups sampled with a single method. Interestingly, we see for both Met-Enkephalin and V3 that for the large group (“ALL”) Oconf and Odens decrease. This basically means that the different methods sample partly different conformations and with partly different probabilities. In the case of V3, this could be a consequence of the huge conformational space so that the trajectories can diffuse into 408

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation

Figure 8. Cluster number Nc as a function of simulation time for Met-Enkephalin (left) and V3 (right). Clustering for each of the seven 200 ns trajectories (distinguished by colors) per pair of starting structure and sampling method was performed at resolution r = 0.11 nm for Met-Enkephalin and r = 0.35 nm for V3, as described in the Molecules and Methods section. Columns correspond to different starting structures, and rows to different sampling methods cMD, aMD, and sMD.

Figure 9. Cluster distribution entropy Sc as a function of simulation time for Met-Enkephalin (left) and V3 (right). Clustering for each of the seven 200 ns trajectories (distinguished by colors) per pair of starting structure and sampling method was performed at resolution r = 0.11 nm for MetEnkephalin and r = 0.35 nm for V3, as described in the Molecules and Methods section. Columns correspond to different starting structures, and rows to different sampling methods cMD, aMD, and sMD.

For Met-Enkephalin, the cluster numbers Nc converge in all 200 ns trajectories (Figure 8, left panels), in agreement with the convergence of Oconf (top left of Figure 7). Accordingly, all numerical derivatives dNc/dt for the last 20 ns converge to 0 (Supporting Information Figure 1). Interestingly, the converged Nc values differ between trajectories and starting structures, especially for cMD. cMD generates the lowest and aMD the highest Nc. These differences of Nc are seemingly at odds with convergence. However, it is important to remember that Odens was not converged at this resolution (bottom left of Figure 7) and that the clustering is not completely robust against density variations. In fact, as Odens increases in the 500 ns trajectories, the cluster numbers converge, too (see section 3.6). For V3, the Nc picture is remarkably different from that of Met-Enkephalin (Figure 8, right panels) but is also consistent with Oconf and Odens. We have a strong dependence of Nc on the sampling method and starting structure. Several cMD simulations show largely different Nc values, each one apparently stable, usually with lower final Nc values for simulations starting from V3a. At the other end of the Nc scale are aMD simulations with final Nc values typically 2 or 3

time than we are revisiting clusters that we have sampled before (center top panel of Figure 7). If we analyze the overlap in two sets of structures that start from different conformations (squares in middle column of Figure 7), we do not make any visible progress from Oconf ≈ Odens ≈ 0 in the first 200 ns. Note that for each cMD, aMD, or sMD, we have used 14 of such 200 ns simulations. We have to go to a low resolution of r = 0.7 nm (right column of Figure 7) to see Oconf showing signs of convergence, though even at this resolution we are far from convergence of Odens. The comparison of sampling methods favors again aMD as a method that consistently performs relatively well. cMD simulations have the strongest dependence on starting conditions. For example, cMD sampling of V3a generates Odens > 0.8, probably because it is trapped in a local cluster (see next sections), while cMD sampling of V3b seems to explore an unbounded set of conformations and thus has Odens ≈ 0. 3.4. Cluster Number. The development of the cluster number Nc is a further indicator of convergence; a fully converged set of trajectories will settle on the same Nc in each trajectory. Nc then measures the size of the accessed conformational space. 409

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation

Figure 10. Combined assessment of convergence for Met-Enkephalin (top row) and V3 (bottom row). For Met-Enkephalin, we contrast sets of 200 ns trajectories with sets of 500 ns trajectories, both at the resolution for Met-Enkephalin, r = 0.11 nm. For V3, we compare 200 ns trajectories at resolutions r = 0.35 nm (bottom left) and r = 0.7 nm (bottom right). In all cases, Odens (vertical axis) and Nc (horizontal axis) were computed for sets of trajectories sampled with cMD, aMD, and sMD from different starting conformations (a = Met79 and b = Met153; a = V3a and b = V3b). Note that axes in the top row have different scales. Points without a letter for the starting conformation encompass the 14 trajectories from both starting conformations. “ALL” corresponds to the set of all trajectories of the respective molecule evaluated at the given resolution r. Points in the top row have been manually shifted by a small amount (≪1 cluster) to avoid overplotting. Error bars are as described in Molecules and Methods section “Overlap Errors”.

possibly because here even the cluster number does not converge. This latter explanation is supported by the observation that negative slopes of Sc in Figure 9 often correspond to time intervals of constant cluster numbers Nc in Figure 8 (both figures use the same color code for trajectories), while positive slopes of Sc often coincide with increasing Nc. aMD shows again the most steady increase to the highest Sc values. For the interpretation of Sc, it is important to consider that a converged Odens ≈ 1 (which implies Oconf ≈ 1) will necessarily lead to a horizontal slope of Sc, but the opposite is not true in practice. In fact, for many trajectories in Figure 9, there are time intervals in which Sc is constant or approximately constant, while Oconf and Odens are far from 1. The case of V3 again makes the point that multiple trajectories experiments are an effective means to prevent misinterpretation as the diversity of Sc courses is a clear warning sign for unconverged simulations. 3.6. Combined Assessment of Convergence. For a comprehensive assessment of convergence, we combine at least two quantities, one quantity such as the number of clusters Nc as a measure of the size of the sampled conformational space and one quantity such as Odens providing information on consistent sampling within the explored conformational space (Figure 10). We can use such a combined assessment, for instance, to judge the convergence of a set of MD trajectories or to compare different sampling methods. For Met-Enkephalin (top row of Figure 10), there is overall little difference between the different sampling methods cMD, aMD, and sMD. At resolution r = 0.11 nm, none of the

times higher and, more importantly, most of them clearly not converged but with a steady increase. sMD traces have an intermediate position both in terms of the course of Nc and final Nc values. For V3, a multitrajectory experiment with independent starting structures and aMD is, according to the above observations, the best approach among those studied here to explore the conformational space. For a flexible molecule of this size, reliance on single trajectories or even pairs of cMD trajectories from the same starting structure can obviously lead to misinterpretations of MD experiments. 3.5. Cluster Distribution Entropy. The last indicator of convergence that we report is the cluster distribution entropy Sc (eq 19). Sc compounds information about cluster number and sampling of the different clusters. If both are converged, Sc as a function of simulation time should have a horizontal slope. For Met-Enkephalin (left half of Figure 9), all trajectories show signs of convergence, but small deviations from horizontal slopes remain in many simulations (Supporting Information Figure 3). The course of Sc seems to be mainly determined by the sampling of new clusters as it approximately converges on the same time scale as the cluster numbers Nc in Figure 8. On the other hand, Sc is not very sensitive to sampling disequilibrium between clusters. There are no strong dependencies on starting structure and sampling method, though cMD levels off at a lower value of about 3.5−3.8 compared to aMD at 3.7−4.0. For the V3 trajectories, the development of Sc conveys the insufficiency of sampling more clearly (right half of Figure 9), 410

DOI: 10.1021/acs.jctc.6b00823 J. Chem. Theory Comput. 2017, 13, 400−414

Article

Journal of Chemical Theory and Computation

respectively, each sampling the same molecular system exhaustively and correctly, then Odens is 1 for each set LA and LB, as well as for the combined set {LA,LB}. However, if A and B both sample exhaustively, but only A correctly and B incorrectly (e.g., B with a bias), then Odens will be 1 for LA and LB, but