MasterMSM: A Package for Constructing Master Equation Models of

4 days ago - Markov state models (MSMs) have become one of the most important techniques for understanding biomolecular transitions from classical ...
0 downloads 0 Views 2MB Size
Application Note Cite This: J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

pubs.acs.org/jcim

MasterMSM: A Package for Constructing Master Equation Models of Molecular Dynamics David de Sancho*,†,‡ and Anne Aguirre‡ †

University of the Basque Country, Faculty of Chemistry, Paseo Manuel Lardizabal, 3, 20018 Donostia-San Sebastián, Spain Donostia International Physics Center, 20018, Donostia-San Sebastián, Spain

Downloaded via UNIV STRASBOURG on August 21, 2019 at 05:32:31 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



ABSTRACT: Markov state models (MSMs) have become one of the most important techniques for understanding biomolecular transitions from classical molecular dynamics (MD) simulations. MSMs provide a systematized way of accessing the long time kinetics of the system of interest from the short-time scale microscopic transitions observed in simulation trajectories. At the same time, they provide a consistent description of the equilibrium and dynamical properties of the system of interest, and they are ideally suited for comparisons against experiment. A few software packages exist for building MSMs, which have been widely adopted. Here we introduce MasterMSM, a new Python package that uses the master equation formulation of MSMs and provides a number of new algorithms for building and analyzing these models. We demonstrate some of the most distinctive features of the package, including the estimation of rates, definition of core-sets for transition based assignment of states, the estimation of committors and fluxes, and the sensitivity analysis of the emerging networks. The package is available at https://github.com/daviddesancho/MasterMSM.



INTRODUCTION Molecular dynamics (MD) simulations with classical force fields are an essential part of the modern toolbox of molecular biophysics.1 In the past, the application of MD has been limited due to the well-known problems in force field accuracy and sampling limitations. However, these have been largely overcome due to community-wide efforts in force field optimization2−5 and the emergence of GPUs6 and dedicated computing architectures for MD.7 A third problem emerges from the sheer size of the data sets resulting from the simulations, which potentially contain information on atomic coordinates of thousands of atoms with femtosecond resolution. Network models of conformational dynamics provide a framework for a data-driven approach to distill information from these data sets.8 Given a relevant partitioning of the system into metastable states, one may attempt to capture the dynamics in the form of a master equation assuming that transitions between states are memoryless.9 This is the main goal of Markov state models (MSMs), which in the past few years have become one of the standard tools for the analysis of MD simulations.10−12 Two excellent software packages have been released for this purpose: PyEMMA13 and MSMBuilder.14 Both are written in Python and carry similar functionalities, including methods for dimensionality reduction for the original coordinate data sets, clustering algorithms, and estimators for the dynamical models. Much of the functionality of PyEMMA is contained within the HTMD platform,15 a single tool that comprises the whole MD life cycle including system preparation, data generation, and analysis. Here we present MasterMSM, a Python package for © XXXX American Chemical Society

the derivation of MSMs with the novelty of exploiting a formulation based on master equations16 rather than the more common approach using transition matrices. We do not attempt to replicate the extensive functionalities of MSMBuilder or PyEMMA. Instead, MasterMSM incorporates a few alternative algorithms for the calculation of rate matrices from MD using a variety of methods, provides estimators for reactive fluxes and committors, allows clustering of MSMs into few-state models and incorporates a sensitivity analysis method for networks of metastable states. MasterMSM can be particularly useful in the context of peptide folding as it includes a torsion angle based discretization that makes use of core-sets.16,17 In this paper we introduce the new package and present some of its more novel functionalities. First we present a few minimal notions to the theory underlying the construction of MSMs. Then we describe the organization of the software package. We conclude presenting the application of MasterMSM to two simple but realistic examples, a Brownian dynamics simulation in a one-dimensional potential and a short alanine-based peptide.



THEORY Chemical Master Equation and Markov State Models. Typically, MSMs are introduced starting from the definition of the transition matrix, T(Δt), for a discrete-time Markov process. The entries Tji(Δt) in the N × N column transition Received: June 6, 2019

A

DOI: 10.1021/acs.jcim.9b00468 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Application Note

Journal of Chemical Information and Modeling

that uses a simple Taylor expansion of the exponential function.21 In this way, we obtain the matrix elements as kji = Tji(Δt)/Δt and the nondiagonal elements from mass conservation as before. This approximation is exact as Δt → 0 and gradually deteriorates for increasing lag times. Reactive Flux Calculation of Committors and Sensitivity Analysis. In the study of conformational transitions in biomolecules, it is often interesting to identify how close a given state may be to either “reactants” or “products” (e.g., folded/unfolded for folding, bound/unbound for binding). The committor (termed pfold in the protein folding literature) allows establishing this quantitatively, as it is defined as the probability that the system will reach a state A (e.g., folded, where pfold = 1) before reaching another state B (e.g., unfolded, where pfold = 0). Importantly, the transition state ensemble corresponds to the case of pfold ≃ 0.5. We derive the value of pfold using the Berezkhovskii−Hummer−Szabo method,22 which also returns reactive fluxes J from the calculated rate matrix K. The value of the flux for individual transitions (Jji) is useful to identify the highest flux pathways that connect the end states.22 A precondition for this type of analysis is that we must be able to label sets of microstates in the transition network that belong to either A or B within our discretization. To estimate the value of pfold for the remainder of microstates (defined in the method as belonging to the intervening region I), we use22

matrix correspond to the probabilities that the system will be found in microstate j after a time interval Δt provided it was in microstate i at an initial time t, i.e., Tji(Δt) = p(j, t + Δt|i, t).18 Given a transition matrix T(Δt), one can determine the time evolution of the system at discrete time intervals, t = nΔt with n ∈ IN, as p(t ) = [T(Δt )]n p(0)

(1)

where the column vector p(t) describes the time-dependent population for the N microstates in the system. Instead, here we concentrate in the rate matrix, K, from Zwanzig’s master equation16 dp(t ) = Kp(t ) dt

(2)

The elements of K are the rate coefficients, k ji , corresponding to the transition i → j. The solution to the master equation is conveniently given by the matrix exponential, p(t ) = exp(Kt )p(0)

(3)

Indeed there is a straightforward equivalence between the transition and the rate matrices T(Δt ) = exp(KΔt )

(4)

however, deriving the rate matrix from the transition matrix turns out to be practically challenging due to the numerical issues in the calculation of matrix logarithms.19 Calculating the Rate Matrix. Given that MasterMSM focuses on the estimation of rate matrices from MD simulation data, it includes different methods for deriving them. First, we include both the lifetime based (LB) and maximum-likelihood propagator based (MLPB) methods described by Buchete and Hummer for the estimation of the rate matrix K.16 The LB method is more direct, as the rate matrix elements are simply calculated using kji =

∑ pfold (j)kji + ∑ kji = 0 j∈I

The calculation for the reactive flux uses this result to obtain22 J=

(5)

(8)

s

global (e.g., folding) rate. This calculation may allow identifying regions within a network where changes, for example, via mutation, would create a more disruptive effect.



WORKFLOW In Figure 1, we show a schematic with the steps involved in the construction of dynamical models within MasterMSM. First, the time series data is parsed with the trajectory module and mapped onto the discrete state space. The resulting discretized trajectories can then be used by the msm module to produce dynamical models. Finally, these can be coarse grained into few-state models using the fewsm module. Below, we briefly summarize each of these steps. Parsing Trajectories. MasterMSM can read trajectories already discretized using an external tool or directly from MD codes. These are processed borrowing the functionalities of the MDtraj package.24 Given our early interest in the study of short peptides,21,25 in the development of MasterMSM we have implemented a dihedral angle-based discretization. In particular, we have included in the trajectory module a

∏ p(j , Δt |i , 0)N

ji

ji

kjipi [pfold (j) − pfold (i)]

Here we are labeling with a star (★) sets of microstates within I that are in the A or B sides of a dividing line Σ. Using this expression for the reactive flux, we developed a sensitivity analysis method for MSMs,23 where we reached compact expressions for a sensitivity parameter that would capture the effects in the reactive global rates given changes in the free ∂ energy of specific microstates, α = ∂g ln k f , where kf is the

In this equation, Ti is the average lifetime of state i, and the branching probability Πji is calculated from the transition N sym counts for the event i → j, Nji, as Πji = Nsym ji /∑l = 1Nli , with Nsym = N + N . For diagonal elements, we simply impose the ji ji ij condition of mass conservation as kii = −∑j≠ikji.16 The MLPB method requires the optimization of the elements of the rate matrix in order to maximize a likelihood function for a discrete state trajectory. Considering that the elements of the matrix exponential correspond to the probabilities of observing such transitions, [exp(KΔt)]ji = p(j, Δt|i, 0), we can write a likelihood function for the observed transitions as16 3=

∑ i ∈ A★j ∈ B★

Πji Ti

(7)

j∈A

(6)

where the product runs over all pairs of final and initial states ji. Using the condition of detailed balance kjipi = kijpj, we can optimize just one triangle of the N × N rate matrix and the N − 1 independent microstate populations. We have implemented simulated annealing with a Metropolis Monte Carlo algorithm20 to perform a search over this parameter space. Finally, we also include an approximate method for calculating rate matrices from the transition matrix T(Δt) B

DOI: 10.1021/acs.jcim.9b00468 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Application Note

Journal of Chemical Information and Modeling

Figure 1. Schematic flowchart of the steps required to build master equation models within the MasterMSM package. Boxes correspond to the main three modules in the package, and with asterisks we mark the main Python classes in each.

number of functions that make use of transition based assignment (TBA) in dihedral angle space as introduced by Buchete and Hummer,16 where each amino-acid can either be in α-helical (A) or extended (E) configurations. We note that finding the partitioning of conformational space that is most kinetically relevant is an area of extensive development in the field.26 The main output from the trajectory module is a time series of discrete states, encapsulated in an instance of the TimeSeries class. Generating and Validating Dynamical Models. From one or a collection of TimeSeries objects we can generate dynamical models, which in MasterMSM are implemented within the MSM class. Each of these is characterized by the lag time for its construction. One typically needs to build multiple MSMs in order to see whether they conform with quality control criteria. For this reason, we have implemented the SuperMSM class that allows one to control the construction and validation of MSMs. Producing Few-State Models from Microscopic MSMs. After generating and validating an MSM, one may want to take advantage of the time-scale separation to build approximate “few-state” models that give an intuitive description of the system. This possibility is built in the MasterMSM package within the fewsm module. We include methods that allow partitioning the microscopic model into a set of M coarse states based on the sign structure or the value of the eigenvectors for the M − 1 slowest modes.16 Again, we note that this is an area of great development and multiple methods have appeared in the literature for the coarse-graining of MSMs.27,28

Figure 2. Master equation model of a one-dimensional two-state system. (A) Time series data from Brownian dynamics simulations on the model potential. In this case, simulations were run using β = 1, x‡ = 4, βΔG‡ = 5, D = 1, and a time step of dt = 1 × 10−4. (B) Time series data for a fragment for the continuous (top) and discretized (bottom) trajectories. (C) Dependence of the slowest relaxation time (τ) estimated from the transition matrix with the lag time (Δt). Error bars are derived from bootstrap tests. The cross at Δt = 1 marks the relaxation time from the LB rate matrix. (D, E, F) Count, transition, and LB rate matrices. Both T and N are shown in logarithmic scale for lag time Δt = 1. (G) Spectrum of relaxation times at lag time Δt = 1. (H) Values for the stationary (ψR0 ) and first (ψR1 ) right eigenvectors as a function of the state index. (I) Values of pfold and the sensitivity parameter α for the different microstates in the system. The green and red bands mark the regions corresponding to the “definitely folded” and “definitely unfolded” states.

scripting capabilities can be used for custom-defined procedures for the discretization. As mentioned above, a key issue in the construction of meaningful MSMs is validation. Here we resort to the simplest test for checking the consistency of the MSM against the lag time (see Figure 2C). The relaxation times derived from this analysis are clearly independent of Δt for values spanning 2 orders of magnitude, which suggests that, even at Δt = 1, the dynamics of the system can be described properly by the Markovian model. MasterMSM can also carry out more robust tests of the resulting model, like checking the results against autocorrelation functions16 or the Chapman−Kolmogorov test.30 In Figure 2D−F we show the count matrix N, the transition matrix T, and the rate matrix K derived using the LB method.16 Alternative procedures for calculating rate matrices, described above, are also readily available in the package. We note that the relaxation time obtained from the rate matrix is perfectly



APPLICATION EXAMPLES Below we present two application examples of MasterMSM for two different model systems. These are included with the package as Jupyter notebooks. Sample Case 1: Two State Model in One Dimension. For our first example, we focus on the case of diffusion on a one-dimensional, two-state model potential defined as F(x) = ΔF‡f(x/x‡), where f(x) = −2x2 for 0 ≤ |x| ≤ 1/2 and f(x) = 2(| x| − 1)2 − 1 for 1/2 < |x|.29 In Figure 2A, we show a representative trajectory produced running Brownian dynamics. Using tools from NumPy, we binned the data evenly into 25 microstates and in this way obtained a direct mapping between the continuous time series on x and the discrete microstate trajectory (Figure 2B). Alternatively, the Python C

DOI: 10.1021/acs.jcim.9b00468 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Application Note

Journal of Chemical Information and Modeling consistent with that obtained from the transition matrix (cross in Figure 2C). For this shortest lag time, we also show the spectrum of eigenvalues of T (Figure 2G), which exhibits the time-scale separation between the first and second slowest relaxation times that we expect of a two-state model. In this case the fast modes cannot be determined with confidence as their time-scale is comparable to Δt. We can also derive the eigenvectors for the stationary mode (ψR0 ) and for the slowest mode (ψR1 , Figure 2H). The former corresponds to the equilibrium distribution while the latter indicates the microstates that are exchanging at the slowest relaxation time as they have ψR1 values of opposite sign. We use the reactive flux method to derive the values of the committor (Figure 2I, top). In this case we define the end states from the regions in the network that are at the bottom of each of the free energy wells (marked with green and red bands). As we may expect given the simplicity of the model potential, there is a sharp increase in the value of the committor precisely at the barrier top. Finally, we include the result of the sensitivity analysis, which indicates regions where changes in the free energy would induce a net increase (for α > 0) or decrease (for α < 0) in the rate of the global transition (Figure 2I, bottom). Clearly, if a mutation raised the value of the free energy barrier the global rate would decrease, as we observe. Sample Case 2: Alanine Pentapeptide. Our second example is an atomistic MD data set for the terminally blocked alanine pentapeptide simulated using the ff03★ force field2 in a 3.2 nm length cubic box of TIP3P water. The data was generated using the Gromacs software package.31 An equilibrium simulation was run in the NVT ensemble at 300 K for 400 ns, using a stochastic dynamics integrator with a friction constant of 1 ps−1. We discretize the simulation data using a simple, Ramachandran-angle based method that considers two possibilities, extended (E) and α-helical (A) (see Figure 3A,B), resulting in a state space with 32 possibilities16 (from fully extended, EEEEE, to fully helical, AAAAA, see Figure 3C). We use the methods in the msm module to derive the transition matrix, the LB rate matrix, and the approximate rate matrix. The TBA assignment of transitions results in a well converged MSM even at low values of the lag time, Δt (Figure 3D). We also show the LB rate matrix relaxation times, that in this case are in excellent agreement with those of the transition matrix. When we analyze the resulting MSM, we find that for this force field, the most helical state (AAAAA) is the most stable (see Figure 3E). There is a gap in the time-scales between the first dynamical mode and the faster relaxation times. The dynamical process associated with the slow mode can be interpreted based on the sign structure of the associated eigenvector, which in this case involves the exchange between helical states and extended states (i.e., folding of the peptide, Figure 3E). We calculate committors and reactive fluxes, which can be useful for representing the network (Figure 3F). Finally, using the fewsm module we lump the microstates of the network onto two coarse-grained states, taking advantage of the time-scale gap. In particular, in this case we group microstates based on their position within the distribution of values of eigenvector ψL1 .16 This results in a simplified two-state description of the system (Figure 3C, bottom) that tracks the changes in the helicity ( f helix) of the peptide.

Figure 3. Torsional MSM from Ala5 MD simulations. (A) Ramachandran maps for each of the five residues in the peptide. (B) Time series data for a ψ torsion angle from one amino acid (top) and to the E and A coarse states (bottom). (C) Time series data for the coarse-state assignment to the 32 state space (blue), estimation of the helicity (orange), and coarse grained state assignment after lumping (green). (D) Dependence of the slowest relaxation times with the lag time for the calculation of the MSM. Squares mark the values of the relaxation times from the approximate rate matrix. (E) Equilibrium distribution (top) and first left eigenvector (bottom) of the MSM. (F) Simplified network representation of the MSM. Flux goes from the fully extended state (EEEEE) to the fully helical state (AAAAA). The node size is proportional to the equilibrium population, the edge width to the reactive flux, while the color scale corresponds to the pfold (blue, 0; white, 0.5; orange, 1). Only the nodes connected by highest fluxes are kept in the representation for simplicity.



CONCLUSIONS We have introduced the MasterMSM package, which we hope will be useful for the analysis of MD simulations of biomolecular systems. Although we do not attempt to include the extensive number of features of PyEMMA or MSMbuilder, we include a few novel functionalities that can be of particular interest when the major focus of the study is the determination of a rate matrix or in particular cases like the study of short peptides. The code is written in Python and can be easily extended with novel methods within each of the main modules.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. D

DOI: 10.1021/acs.jcim.9b00468 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Application Note

Journal of Chemical Information and Modeling ORCID

(16) Buchete, N.-V.; Hummer, G. Coarse Master Equations for Peptide Folding Dynamics. J. Phys. Chem. B 2008, 112, 6057−6069. (17) Schutte, C.; Noé, F.; Lu, J.; Sarich, M.; Vanden-Eijnden, E. Markov State Models Based on Milestoning. J. Chem. Phys. 2011, 134, 204105. (18) Noé, F.; Fischer, S. Transition Networks for Modeling the Kinetics of Conformational Change in Macromolecules. Curr. Opin. Struct. Biol. 2008, 18, 154−162. (19) Prinz, J.-H.; Chodera, J. D.; Pande, V. S.; Swope, W. C.; Smith, J. C.; Noé, F. Optimal Use of Data in Parallel Tempering Simulations for the Construction of Discrete-State Markov Models of Biomolecular Dynamics. J. Chem. Phys. 2011, 134, 244108. (20) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087−1092. (21) De Sancho, D.; Mittal, J.; Best, R. B. Folding Kinetics and Unfolded State Dynamics of the GB1 Hairpin from Molecular Simulation. J. Chem. Theory Comput. 2013, 9, 1743−1753. (22) Berezhkovskii, A.; Hummer, G.; Szabo, A. Reactive Flux and Folding Pathways in Network Models of Coarse-Grained Protein Dynamics. J. Chem. Phys. 2009, 130, 205102. (23) De Sancho, D.; Kubas, A.; Wang, P.-H.; Blumberger, J.; Best, R. B. Identification of Mutational Hot Spots for Substrate Diffusion: Application to Myoglobin. J. Chem. Theory Comput. 2015, 11, 1919− 1927. (24) McGibbon, R. T.; Beauchamp, K. A.; Harrigan, M. P.; Klein, C.; Swails, J. M.; Hernández, C. X.; Schwantes, C. R.; Wang, L.-P.; Lane, T. J.; Pande, V. S. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J. 2015, 109, 1528−1532. (25) De Sancho, D.; Best, R. B. What Is the Time Scale for α-Helix Nucleation? J. Am. Chem. Soc. 2011, 133, 6809−6816. (26) Pérez-Hernández, G.; Paul, F.; Giorgino, T.; De Fabritiis, G.; Noé, F. Identification of Slow Molecular Order Parameters for Markov Model Construction. J. Chem. Phys. 2013, 139, 015102. (27) Beauchamp, K. A.; McGibbon, R.; Lin, Y.-S.; Pande, V. S. Simple few-state models reveal hidden complexity in protein folding. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 17807−17813. (28) Hummer, G.; Szabo, A. Optimal Dimensionality Reduction of Multistate Kinetic and Markov-State Models. J. Phys. Chem. B 2015, 119, 9029−9037. (29) Cossio, P.; Hummer, G.; Szabo, A. On Artifacts in SingleMolecule Force Spectroscopy. Proc. Natl. Acad. Sci. U. S. A. 2015, 112, 14248−14253. (30) Prinz, J.-H.; Wu, H.; Sarich, M.; Keller, B.; Senne, M.; Held, M.; Chodera, J. D.; Schutte, C.; Noé, F. Markov Models of Molecular Kinetics: Generation and Validation. J. Chem. Phys. 2011, 134, 174105. (31) Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J. C. GROMACS: Fast, flexible, and free. J. Comput. Chem. 2005, 26, 1701−1718.

David de Sancho: 0000-0002-8985-2685 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS D.d.S. would like to thank Robert B. Best, Nicolai-Viorel Buchete, Frank Noé, Edina Rosta, Jacob Stevenson, and William Jacobs for helpful discussions. This work was supported by Grants CTQ2015-65320-R and RYC-201619590 from the Ministry of Science, Research and Universities, Grant IT1254-19 from the Basque Government. and an Ikerbasque Research Fellowship to D.d.S. A.A. acknowledges financial support from DIPC.



REFERENCES

(1) Best, R. B. Atomistic Molecular Simulations of Protein Folding. Curr. Opin. Struct. Biol. 2012, 22, 52−61. (2) Best, R. B.; Hummer, G. Optimized Molecular Dynamics Force Fields Applied to the Helix-Coil Transition of Polypeptides. J. Phys. Chem. B 2009, 113, 9004−9015. (3) Piana, S.; Lindorff-Larsen, K.; Shaw, D. How Robust are Protein Folding Simulations with respect to Force Field Parameterization? Biophys. J. 2011, 100, L47. (4) Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B. L.; Grubmüller, H.; MacKerell, A. D., Jr CHARMM36m: an Improved Force Field for Folded and Intrinsically Disordered Proteins. Nat. Methods 2017, 14, 71. (5) Robustelli, P.; Piana, S.; Shaw, D. E. Developing a Molecular Dynamics Force Field for both Folded and Disordered Protein States. Proc. Natl. Acad. Sci. U. S. A. 2018, 115, E4758−E4766. (6) Stone, J. E.; Hardy, D. J.; Ufimtsev, I. S.; Schulten, K. GPUAccelerated Molecular Modeling Coming of Age. J. Mol. Graphics Modell. 2010, 29, 116−125. (7) Shaw, D. E.; Dror, R. O.; Salmon, J. K.; Grossman, J.; Mackenzie, K. M.; Bank, J. A.; Young, C.; Deneroff, M. M.; Batson, B.; Bowers, K. J.et al. Millisecond-Scale Molecular Dynamics Simulations on Anton. In Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis, Portland, OR, November 14−20, 2009; p 39. (8) Rao, F.; Caflisch, A. The Protein Folding Network. J. Mol. Biol. 2004, 342, 299−306. (9) Swope, W. C.; Pitera, J. W.; Suits, F. Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. J. Phys. Chem. B 2004, 108, 6571−6581. (10) Bowman, G. R., Pande, V. S., Noé, F., Eds. An Introduction to Markov State Models and their Application to Long Timescale Molecular Simulation; Advances in Experimental Medicine and Biology, Vol. 797; Springer Science + Business Media, 2014. (11) Chodera, J. D.; Noé, F. Markov State Models of Biomolecular Conformational Dynamics. Curr. Opin. Struct. Biol. 2014, 25, 135− 144. (12) Shukla, D.; Hernández, C. X.; Weber, J. K.; Pande, V. S. Markov State Models Provide Insights into Dynamic Modulation of Protein Function. Acc. Chem. Res. 2015, 48, 414−422. (13) Scherer, M. K.; Trendelkamp-Schroer, B.; Paul, F.; PérezHernández, G.; Hoffmann, M.; Plattner, N.; Wehmeyer, C.; Prinz, J.H.; Noé, F. PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. J. Chem. Theory Comput. 2015, 11, 5525−5542. (14) Harrigan, M. P.; Sultan, M. M.; Hernández, C. X.; Husic, B. E.; Eastman, P.; Schwantes, C. R.; Beauchamp, K. A.; McGibbon, R. T.; Pande, V. S. MSMBuilder: Statistical Models for Biomolecular Dynamics. Biophys. J. 2017, 112, 10−15. (15) Doerr, S.; Harvey, M.; Noé, F.; De Fabritiis, G. HTMD: HighThroughput Molecular Dynamics for Molecular Discovery. J. Chem. Theory Comput. 2016, 12, 1845−1852. E

DOI: 10.1021/acs.jcim.9b00468 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX