CoCo-MD: A Simple and Effective Method for the Enhanced Sampling

Jan 8, 2019 - ... Journal of Natural Products · The Journal of Organic Chemistry .... CoCo-MD: A Simple and Effective Method for the Enhanced ... Comb...
0 downloads 0 Views 777KB Size
Subscriber access provided by United Arab Emirates University | Libraries Deanship

Biomolecular Systems

CoCo-MD: A Simple and Effective Method for the Enhanced Sampling of Conformational Space Ardita Shkurti, Ioanna Danai Styliari, Vivek Balasubramanian, Iain Bethune, Conrado Pedebos, Shantenu Jha, and Charles A. Laughton J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00657 • Publication Date (Web): 08 Jan 2019 Downloaded from http://pubs.acs.org on January 9, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

CoCo-MD: A Simple and Effective Method for the Enhanced Sampling of Conformational Space

Ardita Shkurti†§, Ioanna Danai Styliari†°, Vivek Balasubramanian‡, Iain Bethune#§, Conrado Pedebos†⊥, Shantenu Jha‡, Charles A. Laughton†*

†School of Pharmacy and Centre for Biomolecular Sciences, University of Nottingham,

University Park, Nottingham NG7 2RD, UK. ‡Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ

08854, USA. #EPCC, The University of Edinburgh, James Clerk Maxwell Building, Peter Guthrie Tait

Road, Edinburgh, UK.

ACS Paragon Plus Environment

1

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

⊥CAPES

Page 2 of 40

Foundation, Ministry of Education of Brazil, Brasília, Brazil.

CoCo (“Complementary Coordinates”) is a method for ensemble enrichment based on Principal Component Analysis (PCA) that was developed originally for the investigation of NMR data. Here we investigate the potential of the CoCo method, in combination with molecular dynamics simulations (CoCo-MD), to be used more generally for the enhanced sampling of conformational space. Using the alanine penta-peptide as a model system, we find that an iterative workflow, interleaving short multiple-walker MD simulations with long-range jumps through conformational space informed by CoCo analysis, can increase the rate of sampling of conformational space up to ten times for the same computational effort (total number of MD timesteps). Combined with the reservoir-REMD method, free energies can be readily calculated. An alternative, approximate but fast and practically useful, alternative approach to unbiasing CoCo-MD generated data is also described. Applied to cyclosporine A, we can achieve far greater conformational sampling than has been reported previously, using a fraction of the computational resource. Simulations of the maltose binding protein, begun from the ‘open’ state, effectively sample the ‘closed’ conformation associated with ligand binding. The PCA-based approach means that optimal collective variables to enhance sampling need not be defined in advance by the user, but are identified automatically and are adaptive, responding to the characteristics

ACS Paragon Plus Environment

2

Page 3 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

of the developing ensemble. In addition the approach does not require any adaptations to the associated MD code, and is compatible with any conventional MD package.

Introduction Adequate sampling remains a challenge in molecular simulation. This, as much as the accessing of increasingly long timescales, has been the driver for many decades of development in both hardware and software. Individual simulations have been accelerated through hardware developments such as MD-GRAPE1 and ANTON2. New integrators open up the possibility of accelerating simulations by extending the fundamental timestep3. An alternative approach has been to replace single, long simulations with multiple, shorter ones4,5, and possibly integrating the data through approaches such as Markov Chain Models6 or Replica Exchange strategies7. A complementary approach is to accelerate sampling by modifying – generally flattening the effective potential energy surface, e.g. as in metadynamics8, accelerated dynamics9 and simulated tempering10.

ACS Paragon Plus Environment

3

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 40

Underlying many of these methods is a process of reducing the dimensionality of the search space, and collective variables (CVs) are commonly used to do this. However, choosing an effective set of CVs for use in such methods is a major challenge, particularly if one does not wish to impose a set of these a priori, or if one wishes to adapt them during the simulation. A variety of novel, and established, algorithms for the unsupervised and adaptive construction of CVs exist, but software to close the loop by providing onthe-fly integration of MD simulation and analysis is limited. The work of Preto and Clementi 11, interleaving cycles of MD simulation with data analysis through Locally Scaled Diffusion Maps 12, is an example of this approach and has been shown to be able to increase sampling rates by an order of magnitude. Related methods include the nontargeted PaCS-MD method of Harada and Kitao13, variants thereof 14, and the PCA-based method of Peng and Zhang15.

CoCo (“Complementary Coordinates”) is a PCA-based method developed originally for assessing and potentially enhancing conformational variety within NMR-derived ensembles of molecular structures16. The basis of the CoCo method is as follows. Firstly, the original ensemble of size N (which in the case of deposited NMR data, is typically is between ten and fifty structures) is analysed by PCA in Cartesian coordinate space. Next the coordinates of the structures in a low-dimensional PC subspace (typically three or four

ACS Paragon Plus Environment

4

Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

dimensions) are used to construct a multidimensional histogram; the histogram boundaries in each dimension being chosen to just enclose all N points. Following this, a set of N diverse unoccupied bins in the histogram is identified by an iterative procedure: the unoccupied bin most distant (in PC space) from any occupied bin is chosen, then the bin most distant from all occupied bins plus this new one, and so on until N bins are chosen. Finally the PC coordinates of the midpoints of the chosen bins are backtransformed to generate N new conformations of the molecule of interest, distinct from those in the initial set. In 16 we have shown that though these new structures are not guaranteed to be physically realistic, on assessment against the original NMR restraint data they can prove to be valid and valuable extensions to the ensemble. We hypothesised that this procedure, within an iterative simulation/analysis workflow, could be applied more generally as an enhanced conformational sampling method. The approach is similar to that discussed by Harada, but the CoCo approach has a significant distinguishing feature – while most existing weighted ensemble methods extend sampling each iteration by selecting “seed” structures that are drawn from extreme, or poorly-sampled regions of the so-far explored space, the CoCo method uniquely extends sampling each iteration starting from completely unvisited regions of that space (see Figure 1, also supplementary Figure S1). We hypothesised that these “hyperspace jumps” between the end of one trajectory and the start of the next could have a major impact on the ability of the method to rapidly explore new regions of conformational space.

ACS Paragon Plus Environment

5

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

Figure 1. Key steps in the CoCo-MD method. A: N short independent MD simulations are initiated from a starting conformation. B: The bounding box in a PC-space is identified. C: N unsampled regions within the bounding box are identified. D: New short simulations are run, starting from the points identified in C. The process then iterates: E: The new bounding box is identified. F: N new unsampled regions are selected and MD runs performed, etc. Note that there is no particular connection between individual simulations and the subsequently-identified new start points; i.e. the colour coding of the circles is arbitrary, but illustrates that one can regard the method as generating a set of discontinuous trajectories, where conventional MD-directed sections are separated by long-distance “jumps” through conformational space. Here we describe the implementation of the CoCo-MD workflow and benchmark its performance against conventional MD (CMD) simulations for a number of test systems. The first of these is the alanine penta-peptide, a popular test-bed for the investigation of simulation methods and performance17,18. We describe the construction of the reference CMD ensemble and characterise its performance in respect of two sampling tasks: a) the rate of sampling of conformational space in general, and b) the computational effort

ACS Paragon Plus Environment

6

Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

required to converge a value for the free energy difference between the extended and alpha helical conformational states. Then we analyse the sampling power of the CoCoMD method by itself to address task a), and show how the biased (non-Boltzmann) sampling produced by the method can be corrected for by combining CoCo-MD with the reservoir-REMD method 19,20 to address task b). At this point we then evaluate a faster, but approximate, method for unbiasing CoCo-MD simulation data using a “reservoirREMD-like” approach.

Next we explore the performance of the CoCo-MD method on more challenging systems, specifically the cyclic peptide cyclosporine A (CsA) and maltose binding protein (MBP). Cyclic peptides and their derivatives are of significant and increasing pharmaceutical interest. From the modelling perspective they present some particular challenges, due to the possibility of adopting a (sometimes large) variety of distinct conformational states that are separated by high energy barriers21,22. The ‘biologically relevant’ conformation can be hard to predict and may be dependent on the environment 23,24. MBP is an example of a protein that undergoes a large conformational change on

ligand binding that has been studied by a variety of computational techniques, including accelerated molecular dynamics (AMD) 25, umbrella sampling 26 and another PCA based

ACS Paragon Plus Environment

7

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 40

conformational selection method 27. In this case comparison of CoCo-MD with AMD is particularly relevant as this is another unsupervised enhanced sampling method.

Computational Methods

Simulations for alanine penta-peptide and CsA were performed using AMBER12 28. Simulations for MBP were performed using Gromacs 2016 29. Models of Ace-Ala5-NMe were built in both the extended and helical state and parameterised using the parm99SB force field. Models for CsA were based on the coordinates of the centroid of the first cluster identified by Witek et al. 23 and parameterised using the Gaff forcefield and parameters from Khoury et al. 30. Each model was immersed in a truncated octahedral box of TIP3P water with a minimum buffer of 15 angstroms between any solute atom and the box boundary. The solutes are electrically neutral and no salt ions were added. Models for MBP were constructed using coordinates from both PDB entry 1ANF (31, closed state) and 1OMP (32, open state) and parameterised using the parm03 force field, using default protonation/tautomer states for all residues. Each model was immersed in a rectangular box of TIP3P water with a minimum buffer of 15 angstroms between any solute atom and

ACS Paragon Plus Environment

8

Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the box boundary. Eight waters were substituted with sodium ions to achieve electroneutrality.

For all systems, relaxation and production simulations were run at 300K in an NPT ensemble, using Berendsen thermostats and barostats. The particle-mesh Ewald summation (PME) method was used for long-range electrostatic interactions, SHAKE (or LINCS in the case of Gromacs simulations) was applied to all bonds, and a 2 fs time step was used. Snapshots were saved every ps. Examples of input files for both Amber and Gromacs MD runs are included in the supplementary material, and detail all parameters used.

Accelerated MD (AMD) simulations on CSA were performed using boost potentials on both the total potential and torsional energy terms as calculated using the approach described by Pierce et al.9 from energy values observed in the conventional MD simulations. The values were: EthreshD = 159, alphaD = 8, EthreshP = -16316, alphaP = 1144.

Solute-tempering REMD simulations (REST2) 33 of the alanine pentapeptide were ran using Gromacs 2016.4 interfaced with Plumed 2.4 34 Three simulations were performed,

ACS Paragon Plus Environment

9

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 40

each containing eight replicas with scaling factor (λ) values ranging from 1 (reference system) to 0.25 following a geometric distribution. The acceptance ratios fluctuated between 23% to 46%, with exchanges attempted at every 2 ps. Since GROMACS exchanges coordinates, the replica with a λ = 1 (reference) was employed for the conformational analysis.

Reservoir-REMD simulations 19,20 were run using Amber12. Ten replicas were assigned temperatures from 300 to 336 K in 4 degree intervals – a spacing that produced an exchange success probability of between 0.2 and 0.3. All trajectories began from the extended state and were equilibrated at their selected temperatures for 2 ns before the REMD phase began, during which exchanges were attempted every picosecond. All structures from the CoCo-MD simulation were used as the reservoir, taken to have a temperature of 340 K and with all conformations being given equal weight (AMBER rremd option 2).

Details of the fast approximate “reservoir-REMD-like” like method to correct CoCo-MD generated conformational ensembles is provided in the supplementary material.

Principal component analysis was performed using the pyPcazip toolkit35. The method used for conformational analysis of alanine pentapeptide was as follows. A reference PCA

ACS Paragon Plus Environment

10

Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

model was calculated from the two hundred independent 10ns conventional MD simulations (see below). From the PCA analysis of the trajectory ensemble, the scores for each snapshot in the top three dimensions were calculated and discretised into 30 bins. This process assigns each snapshot to a voxel in the 3D PC space. All further MD simulations were then analysed with reference to this master PCA model and discretisation of conformational space. By treating the 3D histogram as a 3D image, the watershed image segmentation method implemented in the scikit-image Python library was used to assign each voxel to a basin of attraction, and so each snapshot to a conformational state. Trajectories from all further simulations of alanine pentapeptide were discretised using the same PCA model and state definitions. Free energy differences between conformational states were calculated from the ratio of state occupancies. For CSA, PCA was performed on the aggregated data from all simulations (CMD, AMD and CoCo-MD) using heavy atom coordinates. For MBP we followed the method of Bucher et al 25, performing PCA on 30 structures of the protein from the PDB and mapping the sampling of the MD simulations in subspace defined by the first two PCs from this.

CoCo-MD workflows were written in Python, using the ExTASY wrappers (www.bitbucket.org/extasy-project/wrappers) to interface the MD and CoCo analysis (pyCoCo:

www.bitbucket.org/extasy-project/coco)

codes.

Large-scale

CoCo-MD

simulations were performed using the ExTASY domain specific workflow system (which in

ACS Paragon Plus Environment

11

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 40

turn uses Ensemble-toolkit) 36. A flow diagram for the CoCo-MD method is shown in the supplementary data (Figure S1). Other data analysis was performed using MDTraj 37.

Results and Discussion

1. Studies on the alanine pentapeptide. Construction and Analysis of the Reference Dataset. One hundred independent 10 ns conventional MD (CMD) simulations were run starting from each of the extended and helical states of the peptide, differing in the initial assignment of randomised velocities at 300 K. Coordinates were saved every picosecond, giving a total of two million configurations representative of two microseconds of aggregated simulation time. After iterative, “Procrustes”, least-squares fitting38, we performed PCA in Cartesian space on the peptide heavy atoms and calculated the coordinates for each snapshot in the threedimensional subspace defined by the three top PCs (which capture 32%, 13%, and 10% of the total variance respectively). The scores data was used to create a 3D histogram with 30 bins in each dimension. To minimise the bias in the histogram due to the starting structures, we trimmed increasing numbers of snapshots from the beginning of the trajectories and then calculated the Kullback-Leibler divergence between the histograms generated from the first and second halves of the remaining data. The divergence had a

ACS Paragon Plus Environment

12

Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

clear minimum when the initial 2.5 ns of each 10 ns trajectory was discarded (see supplementary figure S3). By watershed partitioning of the density distribution from the whole of the equilibrated data (see methods) we could identify seven local minimum states (Figure 2a). State/cluster 1 is the most populated and corresponds to the unfolded state. State/cluster 5 corresponds to the alpha helical state. By population counting, the helical state has a free energy of about 1.6 kT above the extended state; if we consider the simulations begun from the extended and helical states independently, the values are 1.5 kT and 1.6 kT respectively.

Considering just the CMD simulations begun from the extended state, we examined how the Cartesian coordinate variance in groups of 10 out of the 100 individual walkers (i.e., aggregating data from trajectories 0-9, 10-19, etc.) changed with the length of the simulations. This metric measures the diversity of the ensemble, whose maximisation is the principal aim of the CoCo-MD method. Similarly, for each group of 10 walkers we observed the convergence in the estimate for the free energy difference between the extended and alpha-helical states (Figure 2c). a) 1

2

3 5

4 6

7

ACS Paragon Plus Environment

13

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 40

Figure 2. a): examples of the seven conformational states identified in the reference CMD ensemble of alanine pentapeptide. b): ensemble diversity as a function of simulation time for groups of ten independent walkers begun from the fully extended state; error bars are +/- one standard error (N=10). c): convergence in the estimate for the free energy difference between the extended and alpha-helical states for ensembles of ten walkers; errors bars are +/- one standard error (N=10).

Figure 2b shows that the diversity in the ensemble increases steadily to begin with but tails off towards the end of the simulations. Figure 2c shows that the estimate of the free energy difference between the extended and alpha helical states takes approximately 80 nanoseconds of aggregate simulation time (recalling that each data point comes from the analysis of ten walkers) to converge to a value of about 1.5 kT with a standard error of 0.2 kT. A similar analysis of simulations begun from the helical state gives a value of 1.6 kT with a standard error of 0.3 kT (see supplementary figure S4). With this reference data in hand, we then examined the ability of the CoCo-MD method to sample the conformational space of the alanine pentapeptide more rapidly.

Performance of the CoCo-MD Method. Our first CoCo-MD dataset was generated as follows. Ten independent simulations of 0.02 ns were run, saving a snapshot every picosecond, starting from the extended conformation. The 200 snaphots thus generated

ACS Paragon Plus Environment

14

Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

were analysed using the CoCo approach in three (principal component) dimensions, and ten new start points generated. As discussed above, CoCo-generated structures are not guaranteed to by physically realistic, so they were not used directly as start point for the next round of simulations, rather they were used as the target structures for short restrained MD simulations (10 ps, force constant 1 kCal.mol-1 on each heavy atom) that started from the well-equilibrated initial structure. The final structures from these restrained simulations – which though of potentially high energy will be physically realistic due to the modest force constant used in the targeted MD process - were then used as start points for the next 0.02 ns production phase MD. The trajectory data from these simulations was then added to that from the first round of simulations, and the CoCo analysis performed again. This loop was repeated 100 times in total, to give a total of 20000 conformations representing an aggregate of 20ns of simulation time.

The performance of the CoCo-MD dataset is shown in Figure 3. We see (Figure 3a) that CoCo-MD creates a very diverse conformational ensemble very rapidly, reaching a maximum at around 5 ns (aggregate simulation time) and then slightly falling away beyond this. We interpret this as a rapid exploration of conformational space up to the physical limits enforced by the chemical structure of the peptide, followed by some ‘filling in’ of intermediate conformational states.

ACS Paragon Plus Environment

15

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 40

Though the CoCo-MD method itself does not generate a Boltzmann weighted ensemble, we hypothesised that it might be a very effective ‘pre-processor’ for other methods that can, such as reservoir-REMD. In reservoir-REMD the highest temperature replica simulation is replaced by a static ensemble of structures with pre-computed potential energies. During the r-REMD simulation, at each exchange timepoint one of these is drawn at random from the ensemble for potential exchange with the conventional simulation running at the next-highest temperature.

Though in its original incarnation

the reservoir was expected to have been pre-computed by a method that gave something close to a Boltzmann distribution (albeit at some high temperature that facilitated conformational sampling), it was subsequently shown that this is not required and the approach is still rigorous. When we used the ensemble of 20000 structures from the CoCoMD simulation as the reservoir in a r-REMD simulation (see methods section for details), we could converge a value for the free energy difference (1.4 kT, standard error 0.2, N=4) that is in good agreement with the reference value within 1-2 nanoseconds of additional simulation (per replicate, i.e. 10-20 ns aggregate simulations time) (Figure 3b). For comparison we also performed a conventional solute-tempering REMD simulation (see methods section for details). Within an equivalent amount of aggregate simulation effort this method gives a value of 1.7 kT (standard error 0.1 kT, N=3), see Figure 3c.

ACS Paragon Plus Environment

16

Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 3. a): Ensemble diversity as a function of simulation time in the CoCo-MD simulation (orange) compared to that from the CMD simulations (blue) over the same time period. b): convergence in the free energy difference in subsequent R-REMD simulations; each line relates to an independent simulation. c): time evolution of the free energy difference in comparator solute tempering-REMD simulations; each line relates to an independent simulation. d): convergence in the free energy difference calculated using three independent runs of the fast resampling-based approach (see text for details).

Evaluation of a fast, approximate, alternative to reservoir-REMD for unbiasing CoCo-MD simulations. We surmised that a process that involved drawing at random

ACS Paragon Plus Environment

17

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 40

(with replacement) from a CoCo-MD generated ensemble in such a way that the new sample had an energy distribution appropriate to an equilibrium simulation of the same system at normal temperatures would be a quick, if approximate alternative to the reservoir-REMD method. Details of the algorithm, and evaluation of its performance on a ‘toy’ system, are included in the supplementary material. Applied to the alanine pentapeptide data the results are quite encouraging. Using the original PCA-based discretization of the coordinates of each sample (20 bins for each of the top three PCs) plus a discretization of the potential energy into 70 bins to define the microstates, the synthetic ensemble, resampled from the original and of equal size but reweighted to have a potential energy distribution that matched CMD simulations, gives a value of 1.5 kT for the free energy difference between the extended and helical states, without the requirement for any further simulations (Figure 3d).

Impact of CoCo-MD Parameters on Performance. The initial CoCo-MD dataset was generated using a workflow that used 10 independent walkers and interleaved 100 cycles of 0.02ns MD with 100 CoCo analyses of the aggregating data. We surmised that optimal performance, for a set total amount of simulation time, would come from the right balance between the number of walkers, the lengths of individual MD simulations, and the number of CoCo-based “jumps”.

ACS Paragon Plus Environment

18

Page 19 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The first parameter we explored was the frequency with which MD simulations are interrupted for CoCo-directed “jumps” to new, so-far unsampled, regions of space. A high jump frequency might mean many new areas are sampled, but the intervening unsupervised MD stages would be short, and so restricted in their ability to a) explore the local region of space and b) sample outside the PC subspace that was defined by the CoCo procedure, thus making the process less adaptive.

For this evaluation we fixed the number of independent walkers to ten, as used above, and fixed the total length of each walker’s MD simulation at 2 ns – also as above. Each run was repeated three times, varying the initial randomized velocity assignments.

The results (Figure 4a) show that, by chance, our initial choice of 100 CoCo-driven jumps over 2 ns of simulation was a good one. Although performance is not that sensitive to jump frequency above 25 jumps per nanosecond, below this jump rate it degrades significantly.

The second parameter we explored was the total number of walkers to use. A large number of walkers will permit the simultaneous exploration of many different regions of conformational space, but, if we insist the total aggregate simulation time is kept constant, the length of each MD is shorter. Again, this raises the possibility that the process

ACS Paragon Plus Environment

19

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 40

becomes less adaptive, as PCs identified in early rounds of the process are less likely to be significantly challenged by the short interleaving unsupervised MD stages.

The results show however (Figure 4b) that this does not seem to be an issue. It would appear that as the number of walkers increases, the loss in diversity in the ensemble due to the shorter nature of each MD simulation is well balanced by the increase in diversity due to the increased number of CoCo-generated alternative simulation start points.

Figure 4. Performance of the CoCo-MD method as a function of selected parameters. The metric is the ensemble variance (see text for details). Error bars are standard deviations calculated from three independent experiments. Left: performance as a function of the frequency of CoCo-driven ‘jump’ steps during the MD simulations, for a constant number (ten) of independent walkers. Right: performance as a function of the number of walkers,

ACS Paragon Plus Environment

20

Page 21 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

for a constant number of CoCo-jumps (100) and a constant aggregate simulation time (2 ns).

2. Studies on cyclosporine A. Simulation setup. As described in the methods section above, all simulations of CsA began from the coordinates of the centroid of the first cluster identified by Witek et al. from their 10 microsecond simulations of the peptide in water23,24. We used 10 walkers, with randomised initial velocities and each walker was followed for 2ns, either as a continuous, single, trajectory (control CMD simulations and AMD simulations), or as a series of one hundred 0.02 ns MD simulations interspersed with CoCo ‘jumps’ (CoCo-MD simulations). In view of the more complex geometry of CsA in comparison to pentaalanine, CoCo analysis used the distribution of the trajectory snapshots in the subspace defined by the top four (rather than three) eigenvectors from PCA and discretised their distribution into 13 bins in each dimension (rather than 30), so that the total number of bins was roughly conserved (134 = 28561; 303 = 27000).

Analysis of sampling. We combined the trajectory data from the CMD and AMD simulations with that from the CoCo-MD simulations, added the 1471 cluster centres identified by Witek et al., and performed PCA on the aggregate. Figure 5 shows the

ACS Paragon Plus Environment

21

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 40

resulting distributions of structures in the PC1/PC2 plane. As expected, CMD simulations of 2 ns each – even when there are ten replicates – fail to sample much of the space reached by Witek et al. in their 100 x 100 ns simulations (Figure 5a). The AMD simulations do not do much better (Figure 5b), perhaps because though the AMD method significantly flattens the torsional potential barriers, conformational changes in such constrained ring systems are impeded more significantly by non-bonded interactions. Though in recent work the AMD method has been shown to be effective for other cyclic peptides22, the simulations used there were of the order of hundreds of nanoseconds. In contrast, the CoCo-MD simulations not only sample all the space found by Witek et al., but – even allowing that the reference structures are the centroids of clusters, not the full distribution - much more besides (Figure 5c). Analysis of Ramachandran maps (Figure 6, Figure S5) confirms the ability of the CoCo-MD method to very effectively sample phi/psi space. Figure S5 may be directly compared with Figure S4 in 24.

Analysis of the potential energy distribution for the CoCo-MD ensemble shows that it is shifted to higher energies compared to an equivalent CMD simulation (Figure 5d). However, even after we apply our reweighting method to correct for this (using the original four-dimensional PCA discretization for coordinate binning together with 70 bins for the potential energy to define microstates), the distribution of the ensemble remains very broad (Figure 5e). Interestingly, the population density plot (Figure 5f) shows that

ACS Paragon Plus Environment

22

Page 23 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

many of the diverse, thermally-accessible, conformations identified by CoCo-MD are probably rarely observed states. The ability of the method to find them is encouraging – though rare, as thermally accessible they could be bio-active.

Figure 5. Performance of the CoCo-MD method for conformational sampling of CsA, comparing distributions in the common 2D subspace determined by PCA. Panel a: distribution of the centroids from the work of Witek et al. (magenta crosses), compared to the sampling obtained using CMD (blue dots), starting from the centroid structure indicated with the magenta circle. Panel b: equivalent plot with sampling obtained using AMD shown as cyan dots. Panel c: equivalent plot with sampling obtained using CoCoMD shown as orange dots. Panel d: potential energy distributions from the CMD (blue) and CoCo-MD (orange) simulations. Panel e: CoCo-MD data after resampling to match

ACS Paragon Plus Environment

23

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 40

the CMD potential energy distribution. Panel f: Contour plot for reweighted CoCo-MD data, contours are at 1, 2, 3, 4, and 5 times the mean sample density.

Figure 6. Performance of the CoCo-MD method for conformational sampling of CsA, showing Ramachandran maps for residue 5 as an example. Left panel: data for the centroids from Witek et al. Centre panel: data from the 10 x 2ns CMD simulations. Right panel: data from the 10 x 2 ns CoCo-MD simulations. Data for all amino acids is given in supplementary figure S5.

As another test of diversity in the CoCo-MD generated ensemble, we assigned a 33letter code to each sample, recording the regions of phi and psi space (gauche-plus, trans, and gauche-minus, “p”, “t”, and “m”) and of omega space (cis or trans, “c” or “t”) for each of the 11 amino acids. By this metric, we find that the CoCo-MD ensemble contains a total of 9,822 unique conformational states within its 20,000 samples. For comparison, the equally-sized ensemble generated by the CMD simulations contains just 2,224 unique conformational states, and the AMD simulations identified 5,912 states.

ACS Paragon Plus Environment

24

Page 25 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3. Studies on MBP. Simulation setup. As described in the methods section, models of MBP in both the ‘open’ and ‘closed’ states were generated. Control CMD simulations of 200 ns were run for each, sampling the trajectory every 1 ps. The CoCo-MD simulation begun from the ‘open’ state, used 10 walkers, each of which was followed over 40 cycles of 0.5 ns MD interspersed with CoCo “jumps”, resulting in an aggregate simulation time of 200 ns. CoCo analysis was performed using the top 4 eigenvectors and discretising sampled space into 13 bins in each dimension.

Analysis of sampling. In this case we followed the approach of Bucher at el. 25 and analysed sampling in the 2D space defined by first two principal components of an ensemble of 30 crystal structures (see Supplementary data for details). Results are shown in Figure 7, which may be directly compared with Figure 2 in Bucher et al. We see that both CMD simulations tend to drift away from their start points towards some form of intermediate conformation. The simulation begun from the closed state in particular drifts away from the crystal structure – not surprising, since there is no ligand present in the simulation. The enhanced sampling obtained with CoCo-MD is striking. Begun from the ‘open’ conformation, the ensemble envelops that obtained by the same amount of CMD begun from the same starting structure, and even envelops space sampled by the CMD

ACS Paragon Plus Environment

25

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 40

begun from the closed structure. There is also significant sampling of “super-open” conformations (large positive values of PC1). NMR structure 2MV0 39 provides the extreme point in PC2 – we also see significant sampling towards this state that is only seen in solution structures.

Figure 7. Analysis of sampling from CMD and CoCo-MD simulations of MBP. In panels a), b), c), e) and f) black dots mark the positions of crystal structures in the “closed” (lefthand cluster) and “open” (right-hand cluster) conformations. The lone point towards the top of each plot marks the unusual conformation in NMR structure 2MV0. Panel a): cMD

ACS Paragon Plus Environment

26

Page 27 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

simulations begun from the “open” state (1OMP). Panel b): cMD simulations begun from the “closed” state (1ANF). Panel c): CoCo-MD simulations begun from the “open” state. Panel d): Potential energy distributions in the reference CMD simulation (blue) and CoCo-MD simulations (orange). Panel e) Distribution of structures generated by resampling from the CoCo-MD structural ensemble according to the CMD potential energy distribution. Panel f): Population density plot for the reweighted ensemble, contoured at 0.2, 0.4, 0.6 and 0.8 of the maximum value observed.

The structures generated by CoCo-MD have a more positive potential energy distribution than those in the reference CMD simulations (Figure 7d), but the distributions overlap. Applying our reweighting/resampling procedure, the conformational diversity of the ensemble is only very slightly perturbed (Figure 7e). NMR data40 fits a model where the protein exists 95% of the time in the open state, and 5% in a “semi closed” state. Gratifyingly, the reweighted ensemble shows a small local maximum close to – but not at – the ‘closed’ state. If we take PC1 = -25 as the dividing point, then our reweighted CoCoMD simulations give a value of 92:8 for the open:closed ratio.

Conclusions

ACS Paragon Plus Environment

27

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 40

We have found that within the context of an iterative simulation/data analysis workflow, the CoCo ensemble expansion method is a very effective approach to enhanced sampling. The method does not require any adaptations to the MD code, so this can run at its fullyoptimised speed. The method is unsupervised – the user does not need to specify in advance the “interesting” reaction coordinates, these emerge and adapt automatically as the sampling progresses. There could be concern that for large, complex, systems the focus on enhancing sampling in the space of a relatively small (3 or 4) number of collective coordinate dimensions could limit the effectiveness of the method but so far we see no evidence of this. The adaptive nature of the procedure, in which PCs are recalculated each cycle, and the intervening unbiased MD simulation stages, seem to mitigate this issue. In complex cases where there is a priori information as to where enhanced sampling is required (e.g. in a single domain or region of a protein), then the CoCo part of the CoCoMD method can be “focussed” on this area, so sampling is not restricted to the most dominant modes of motion of the protein as a whole. Applied to a new system, it would certainly be appropriate to conduct short scaling runs to determine the optimal choice of number of PC dimensions, and number of bins per dimension for the sample histograms. Many decisions will be influenced in practice by the scale of available computing resources, but we have evidence that the number of walkers, and interval between CoCo “jumps” are not critical performance parameters, so there are few variables to explore. How one should decide when to terminate the enhanced sampling process will benefit

ACS Paragon Plus Environment

28

Page 29 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

from further investigation. The key metric the CoCo-MD process will maximise is the variance in the Cartesian coordinates of the ensemble. In this work, this only plateaued, within the number of cycles run, for the alanine penta-peptide example. However through the reweighting procedure good results were obtained for all systems studied.

Applied to the classic alanine penta-peptide test-case, CoCo-MD samples space about ten times faster than conventional MD. Though the resulting ensemble is not Boltzmannweighted, by using it as the reservoir for an R-REMD simulation we can obtain free energy differences between conformational states quite straightforwardly and at significantly reduced overall computational cost. For example, to converge a value for the free energy difference between the extended and alpha-helical states of the peptide requires about 700 ns (70 ns x 10 walkers) of conventional MD simulation, while the CoCo-MD/R-REMD method requires just 40 ns (2 ns x 10 walkers for CoCo-MD, then 2 ns x 10 replicas for RREMD).

If a rigorous estimate of free energies is not required, a simple

resampling/rescaling procedure can be employed instead of R-REMD, at almost no additional computational cost. Within an envelope of total computational effort expended, the performance of the CoCo-MD method is not particularly sensitive to key parameters such as the frequency of “jumps” or number of independent walkers.

ACS Paragon Plus Environment

29

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 40

Applied to cyclosporine A, a rich ensemble of alternative, energetically reasonable, conformational states for the cyclic peptide can be generated with a fraction of the effort required to do the same by conventional MD. Applied to MBP, this unsupervised method is very effective in sampling in the region of the closed, ligand-binding, state even when the simulation is begun from the open conformation, and no ligand is present.

The results demonstrate the potential of flexible workflows in simulation science, and the value of developing tools that maximise the possibilities for interfacing established and new computational methods in as seamless and automatable way as possible. We are now studying how the CoCo-MD and R-REMD interface can be streamlined, and how the method may be applied to the identification of rare conformational states and cryptic binding pockets in proteins.

AUTHOR INFORMATION Corresponding Author *Charles Laughton, Centre for Biomolecular Sciences, University of Nottingham, University Park, Nottingham NG7 2RD, UK. Email: [email protected] Present Addresses §Hartree

Centre, STFC Daresbury Laboratory, Scientific Computing Department, Cheshire

WA4 4AD, UK.

ACS Paragon Plus Environment

30

Page 31 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

°School of Life and Medical Sciences, Department of Pharmacy, Pharmacology and Postgraduate Medicine, University of Hertfordshire, College Lane, AL10 9AB, UK

Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

ACKNOWLEDGMENT We thank Sereina Riniker for providing us with data for the CsA cluster centres (reference 22). This work is a product of the ExTASY consortium (http://www.extasyproject.org) supported by EPSRC Grant EP/K039490/1 and NSF SSI Awards (CHE1265788 and CHE-1265929). This work used the ARCHER UK National Supercomputing Service (http://www.archer.ac.uk), with additional access through HECBioSim (EPSRC Grant EP/L000253/1), the University of Nottingham HPC service, and we acknowledge access to XSEDE computational facilities via TG-MCB090174.

SUPPORTING INFORMATION

ACS Paragon Plus Environment

31

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 40

Examples of the input files used for Amber and Gromacs MD simulations; a flow diagram of the CoCo-MD procedure (Figure S1); a full description of the ensemble reweighting procedure and its validation on a simple model system; the results of the Kullback-Leibler test for equilibrated sampling in the penta-alanine reference ensemble (Figure S3); a graph of the convergence in the estimate for the extended/alpha-helical free energy difference for reference conventional MD simulations of penta-alanine begun from the alpha-helical state (Figure S4); Ramachandran maps for the sampling of cyclosporine A by various methods (Figure S5); list of the PDB codes of maltose binding protein structures used to construct the PC model used for analysis of the MBP sampling performance. This information is available free of charge via the Internet at http://pubs.acs.org

REFERENCES (1)

Narumi, T.; Yasuoka, K.; Taiji, M.; Zerbetto, F.; Höfinger, S. Fast Calculation of Electrostatic Potentials on the GPU or the ASIC MD-GRAPE-3. Comput. J. 2011, 54 (7), 1181–1187.

(2)

Shaw, D. E.; Chao, J. C.; Eastwood, M. P.; Gagliardo, J.; Grossman, J. P.; Ho, C. R.; Lerardi, D. J.; Kolossváry, I.; Klepeis, J. L.; Layman, T.; et al. Anton, a Special-Purpose Machine for Molecular Dynamics Simulation. Commun. ACM 2008, 51 (7), 91.

ACS Paragon Plus Environment

32

Page 33 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(3)

Minary, P.; Tuckerman, M. E.; Martyna, G. J. Long Time Molecular Dynamics for Enhanced Conformational Sampling in Biomolecular Systems. Phys. Rev. Lett. 2004, 93 (15).

(4)

Perez, J. J.; Tomas, M. S.; Rubio-Martinez, J. Assessment of the Sampling Performance of Multiple-Copy Dynamics versus a Unique Trajectory. J. Chem. Inf. Model. 2016, 56 (10), 1950–1962.

(5)

Atzori, A.; Bruce, N. J.; Burusco, K. K.; Wroblowski, B.; Bonnet, P.; Bryce, R. A. Exploring Protein Kinase Conformation Using Swarm-Enhanced Sampling Molecular Dynamics. J. Chem. Inf. Model. 2014, 54 (10), 2764–2775.

(6)

Noé, F.; Schütte, C.; Vanden-Eijnden, E.; Reich, L.; Weikl, T. R. Constructing the Equilibrium Ensemble of Folding Pathways from Short Off-Equilibrium Simulations. Proc. Natl. Acad. Sci. U. S. A. 2009, 106 (45), 19011–19016.

(7)

Sugita, Y.; Okamoto, Y. Replica-Exchange Molecular Dynamics Method for Protein Folding. Chem. Phys. Lett. 1999, 314 (1–2), 141–151.

(8)

Barducci, A.; Bonomi, M.; Parrinello, M. Metadynamics. Wiley Interdisciplinary Reviews: Computational Molecular Science. 2011, pp 826–843.

(9)

Pierce, L. C. T.; Salomon-Ferrer, R.; Augusto F. De Oliveira, C.; McCammon, J. A.; Walker, R. C. Routine Access to Millisecond Time Scale Events with Accelerated

ACS Paragon Plus Environment

33

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 40

Molecular Dynamics. J. Chem. Theory Comput. 2012, 8 (9), 2997–3002. (10) Pan, A. C.; Weinreich, T. M.; Piana, S.; Shaw, D. E. Demonstrating an Order-ofMagnitude Sampling Enhancement in Molecular Dynamics Simulations of Complex Protein Systems. J. Chem. Theory Comput. 2016, 12 (3), 1360–1367. (11) Preto, J.; Clementi, C. Fast Recovery of Free Energy Landscapes via Diffusion-MapDirected Molecular Dynamics. Phys. Chem. Chem. Phys. 2014, 16, 19181–19191. (12) Zheng, W.; Rohrdanz, M. A.; Maggioni, M.; Clementi, C. Polymer Reversal Rate Calculated via Locally Scaled Diffusion Map. J. Chem. Phys. 2011, 134 (14). (13) Harada, R.; Kitao, A. Nontargeted Parallel Cascade Selection Molecular Dynamics for Enhancing the Conformational Sampling of Proteins. J. Chem. Theory Comput. 2015, 11 (11), 5493–5502. (14) Harada, R.; Takano, Y.; Baba, T.; Shigeta, Y. Simple, yet Powerful Methodologies for Conformational Sampling of Proteins. Phys. Chem. Chem. Phys. 2015, 17 (9), 6155– 6173. (15) Peng, J.; Zhang, Z. Simulating Large-Scale Conformational Changes of Proteins by Accelerating Collective Motions Obtained from Principal Component Analysis. J. Chem. Theory Comput. 2014, 10 (8), 3449–3458. (16) Laughton, C. A.; Orozco, M.; Vranken, W. COCO: A Simple Tool to Enrich the

ACS Paragon Plus Environment

34

Page 35 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Representation of Conformational Variability in NMR Structures. Proteins Struct. Funct. Bioinforma. 2009, 75 (1), 206–216. (17) Margulis, C. J.; Margulis, C. J.; Stern, H. a.; Stern, H. a.; Berne, B. J.; Berne, B. J. Helix Unfolding and Intramolecular Hydrogen Bond Dynamics in Small α-Helices in Explicit Solvent. J. Phys. Chem. B 2002, 106 (41), 10748–10752. (18) Hegefeld, W. A.; Chen, S. E.; Deleon, K. Y.; Kuczera, K.; Jas, G. S. Helix Formation in a Pentapeptide: Experiment and Force-Field Dependent Dynamics. J. Phys. Chem. A 2010, 114 (47), 12391–12402. (19) Okur, A.; Roe, D. R.; Cui, G.; Hornak, V.; Simmerling, C. Improving Convergence of Replica-Exchange Simulations through Coupling to a High-Temperature Structure Reservoir. J. Chem. Theory Comput. 2007. (20) Roitberg, A. E.; Okur, A.; Simmerling, C. Coupling of Replica Exchange Simulations to a Non-Boltzmann Structure Reservoir. J. Phys. Chem. B 2007. (21) Kessler, H. Conformation and Biological Activity of Cyclic Peptides. Angewandte Chemie International Edition in English. 1982. (22) Kamenik, A. S.; Lessel, U.; Fuchs, J. E.; Fox, T.; Liedl, K. R. Peptidic Macrocycles Conformational Sampling and Thermodynamic Characterization. J. Chem. Inf. Model. 2018, 58, 982–992.

ACS Paragon Plus Environment

35

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 40

(23) Witek, J.; Keller, B. G.; Blatter, M.; Meissner, A.; Wagner, T.; Riniker, S. Kinetic Models of Cyclosporin A in Polar and Apolar Environments Reveal Multiple Congruent Conformational States. J. Chem. Inf. Model. 2016, 56 (8), 1547–1562. (24) Witek, J.; Mühlbauer, M.; Keller, B. G.; Blatter, M.; Meissner, A.; Wagner, T.; Riniker, S. Interconversion Rates between Conformational States as Rationale for the Membrane Permeability of Cyclosporines. ChemPhysChem 2017. (25) Bucher, D.; Grant, B. J.; Markwick, P. R.; McCammon, J. A. Accessing a Hidden Conformation of the Maltose Binding Protein Using Accelerated Molecular Dynamics. PLoS Comput. Biol. 2011, 7 (4). (26) Mascarenhas, N. M.; Kästner, J. How Maltose Influences Structural Changes to Bind to Maltose-Binding Protein: Results from Umbrella Sampling Simulation. Proteins Struct. Funct. Bioinforma. 2013, 81 (2), 185–198. (27) Harada, R.; Shigeta, Y. Efficient Conformational Search Based on Structural Dissimilarity Sampling: Applications for Reproducing Structural Transitions of Proteins. J. Chem. Theory Comput. 2017, acs.jctc.6b01112. (28) Case, D. A.; Darden, T. A.; Cheatham, III, T. E.; Simmerling, C. L.; Wang, J.; Duke, R. E.; Luo, R.; Walker, R. C.; Zhang, W.; Merz, K. M.; et al. AMBER 12. University of California, San Freancisco 2012.

ACS Paragon Plus Environment

36

Page 37 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(29) Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindah, E. Gromacs: High Performance Molecular Simulations through Multi-Level Parallelism from Laptops to Supercomputers. SoftwareX 2015. (30) Khoury, G. A.; Smadbeck, J.; Tamamis, P.; Vandris, A. C.; Kieslich, C. A.; Floudas, C. A. Forcefield-NCAA: Ab Initio Charge Parameters to Aid in the Discovery and Design of Therapeutic Proteins and Peptides with Unnatural Amino Acids and Their Application to Complement Inhibitors of the Compstatin Family. ACS Synth. Biol. 2014, 3 (12), 855–869. (31) Quiocho, F. A.; Spurlino, J. C.; Rodseth, L. E. Extensive Features of Tight Oligosaccharide

Binding

Revealed

in

High-Resolution

Structures

of

the

Maltodextrin Transport/Chemosensory Receptor. Structure 1997, 5 (8), 997–1015. (32) Sharff, A. J.; Rodseth, L. E.; Spurlino, J. C.; Quiocho, F. A. Crystallographic Evidence of a Large Ligand-Induced Hinge-Twist Motion Between the 2 Domains of the Maltodextrin Binding-Protein Involved in Active-Transport and Chemotaxis. Biochemistry 1992, 31 (44), 10657–10663. (33) Wang, L.; Friesner, R. A.; Berne, B. J. Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (REST2). J. Phys. Chem. B 2011.

ACS Paragon Plus Environment

37

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 40

(34) Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. PLUMED 2: New Feathers for an Old Bird. Comput. Phys. Commun. 2014. (35) Shkurti, A.; Goni, R.; Andrio, P.; Breitmoser, E.; Bethune, I.; Orozco, M.; Laughton, C. A. PyPcazip: A PCA-Based Toolkit for Compression and Analysis of Molecular Simulation Data. SoftwareX 2016, 5, 44–50. (36) Balasubramanian, V.; Treikalis, A.; Weidner, O.; Jha, S. Ensemble Toolkit: Scalable and Flexible Execution of Ensembles of Tasks. In Proceedings of the International Conference on Parallel Processing; 2016. (37) McGibbon, R. T.; Beauchamp, K. A.; Harrigan, M. P.; Klein, C.; Swails, J. M.; Hern??ndez, C. X.; Schwantes, C. R.; Wang, L. P.; Lane, T. J.; Pande, V. S. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J. 2015, 109 (8), 1528–1532. (38) Dryden, I. L.; Mardia, K. V. Statistical Shape Analysis. Stat. Med. 2000, 19 (19), 2716– 2717. (39) Lange, O. F.; Rossi, P.; Sgourakis, N. G.; Song, Y.; Lee, H.-W.; Aramini, J. M.; Ertekin, A.; Xiao, R.; Acton, T. B.; Montelione, G. T.; et al. Determination of Solution Structures of Proteins up to 40 KDa Using CS-Rosetta with Sparse NMR Data from Deuterated Samples. Proc. Natl. Acad. Sci. 2012.

ACS Paragon Plus Environment

38

Page 39 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(40) Tang, C.; Schwieters, C. D.; Clore, G. M. Open-to-Closed Transition in Apo MaltoseBinding Protein Observed by Paramagnetic NMR. Nature 2007, 449 (7165), 1078– 1082.

ACS Paragon Plus Environment

39

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 40

for Table of Contents use only

CoCo-MD: A Simple and Effective Method for the Enhanced Sampling of Conformational Space

Ardita Shkurti†§, Ioanna Danai Styliari°†, Vivek Balasubramanian‡, Iain Bethune#, Conrado Pedebos†⊥, Shantenu Jha‡, Charles A. Laughton†*

Conventional MD or CoCo-MD Cyclosporine A

ACS Paragon Plus Environment

40