An Integrated Markov State Model and Path Metadynamics Approach

21 hours ago - Unveiling the mechanistic features of drug-target binding is of central interest in biophysics and drug discovery. Herein, we address t...
0 downloads 0 Views 4MB Size
Subscriber access provided by Gothenburg University Library

Biomolecular Systems

An Integrated Markov State Model and Path Metadynamics Approach to Characterize Drug Binding Processes Mattia Bernetti, Matteo Masetti, Maurizio Recanatini, Rommie E. Amaro, and Andrea Cavalli J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.9b00450 • Publication Date (Web): 22 Aug 2019 Downloaded from pubs.acs.org on August 25, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

An Integrated Markov State Model and Path

2

Metadynamics Approach to Characterize Drug

3

Binding Processes

4

Mattia Bernetti†,‡, Matteo Masetti*†, Maurizio Recanatini†, Rommie E. Amaro§, and

5

Andrea Cavalli*‖

6

†Department of Pharmacy and Biotechnology, Alma Mater Studiorum – Università di

7

Bologna, Via Belmeloro 6, I-40126 Bologna, Italy

8

§Department of Chemistry and Biochemistry, University of California, San Diego, 9500

9

Gilman Drive, La Jolla, California 92093-0340, United States

10

‖Computational & Chemical Biology, Istituto Italiano di Tecnologia, Via Morego 30, I-

11

16163 Genova, Italy

12

ACS Paragon Plus Environment

1

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Markov

State

Models;

Transition-Path

Page 2 of 64

1

KEYWORDS.

Theory;

Well-Tempered

2

Metadynamics; Path Collective Variables; Miminum Free Energy Pathway; 2-Adrenergic

3

Receptor.

4

5

ABSTRACT. Unveiling the mechanistic features of drug-target binding is of central

6

interest in biophysics and drug discovery. Herein, we address this challenge by combining

7

two major computational approaches, namely Molecular Dynamics (MD) simulations and

8

Markov State Models (MSM), with a Path Collective Variables (PCVs) description coupled

9

with metadynamics. We apply our methodology to reconstruct the binding process of the

10

antagonist alprenolol to the β2-adrenergic receptor, a well-established pharmaceutical

11

target. The devised protocol allowed us to estimate the binding free energy and identify

12

the minimum free energy path leading to the protein-ligand complex. In summary, we

13

show that MSM and PCVs can be efficiently integrated to shed light upon mechanistic

14

and energetic details underlying complex recognition processes in biological systems.

ACS Paragon Plus Environment

2

Page 3 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

2

1. INTRODUCTION

3

Computational simulations are becoming increasingly effective in complementing experiments

4

and providing atomic-level descriptions of complex biological phenomena.1–5 In the context of

5

drug discovery, Molecular Dynamics (MD) simulations and related methods are widely employed

6

to compute key thermodynamic quantities in an increasingly reliable and routine fashion.6,7 Owing

7

to recent advances in software8–11 and hardware,12,13 predicting drug-target (un)binding

8

mechanisms and associated rates is also now within reach.6,14–22 Still, the timescales gap between

9

simulations and experiments remains an open challenge that potentially undermines any

10

computational investigation in this field.23,24

11

Typically, two classes of MD simulation-based approaches can be adopted to address the

12

challenge of bridging scales.6,14 Biased-MD simulation methods, including metadynamics25

13

among others, rely on the notion of reaction coordinates (or collective variables, CVs) whereby

14

additional forces are exploited to accelerate rare events.26 As the free energy surface as a function

15

of CVs is a natural outcome of these methods, a detailed mechanistic interpretation of the process

16

is also guaranteed. If a proper choice of CVs is made, reliable free energy estimations can be

17

attained at a moderate expense of computational resources.27 Unfortunately, choosing the best CVs

18

is not always an easy task, and time consuming trial-and-error procedures are usually exploited to

19

this aim. When prior knowledge about the investigated process is available, an optimal description

20

of the event can be achieved through the Path-CVs (PCVs) formalism.28 PCVs allow to trace the

21

minimum free energy path (MFEP) with high accuracy and have been successfully employed to

22

describe complex biomolecular processes, including conformational transitions,29 ion

23

conduction,30 and ligand (un)binding.31–33 However, since the so-called “guess path” is required

ACS Paragon Plus Environment

3

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 64

1

by construction, the practical usage of PCVs is strongly limited in real-life scenarios.34

2

Alternatively to biased-MD simulation methods, spontaneous binding events can also be obtained

3

through extensive conventional MD simulations in the microsecond to millisecond timeframe.35–

4

37

5

to make optimal usage of distinct simulations towards a statistical description of binding.38–43

6

Notably, brute force sampling followed by MSM does not require any prior knowledge about the

7

investigated process, and it allows relevant thermodynamic and kinetic quantities to be extracted.44

8

Unfortunately, the computational resources needed to achieve reliable estimates are still daunting

9

for the vast majority of the community and sampling of high energy states remains quite poor even

10

In this context, Markov State Models (MSM) provide the appropriate mathematical framework

when running remarkably long conventional MD simulations.

11

Here, we combine MSM and PCVs to take advantage of the individual benefits of the two

12

methods. Our protocol rests on the assumption that at least a few binding events can be observed

13

through conventional MD simulations. Subsequently, a preliminary MSM is constructed in a low-

14

data regime, and the outcome is employed as a template to construct the guess path, which is

15

required for PCVs-based simulations. In this way, only the most relevant portion of the phase space

16

is resampled through metadynamics, improving sampling around transition states and high energy

17

regions. In Figure 1 a schematic picture of the whole framework is provided. We show that this

18

procedure leads to a highly detailed mechanistic interpretation of the protein-ligand recognition

19

process while saving valuable computational time. Additionally, the free energy profile associated

20

with the binding mechanism allowed us to pinpoint free energy minima and transition states, and

21

to estimate the binding free energy in a remarkable agreement with experimental measures. As a

22

proof of concept, we apply the methodology to a pharmaceutically relevant problem, namely the

23

association of the drug alprenolol to the β2-adrenergic receptor (β2-AR, see Figure 2).37

ACS Paragon Plus Environment

4

Page 5 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1 2

2. RESULTS AND DISCUSSION

3

2.1 From a discrete-states kinetic model to Collective Variables.

4

A subset of eleven previously computed binding trajectories collected on the Anton specialized

5

hardware architecture (Table S1) was employed to build the kinetic model.37 In all the runs, the

6

ligand successfully reached the orthosteric pocket inside the receptor through strikingly similar

7

pathways. Moreover, in six of these trajectories the crystallographic geometry was recapitulated

8

within an RMSD lower than 0.8 Å.37 We built the MSM through the PyEMMA 2 software

9

package,45 using a lag time of 100 ns, and further analyzed the kinetic model with Transition-Path

10

Theory (TPT)46 in order to extract the most relevant transition pathways from the ensemble of

11

microstates.47,48 Provided two endpoints, which corresponded in this case to the protein-ligand

12

encounter complex (EC) and the bound state, TPT computes the net flux over all the possible

13

transitions between each pair of microstates. The network of fluxes is shown in Figure 3, where

14

the A state corresponds to the EC, while B represents the bound state. As the Figure shows, while

15

a dominant pathway could be easily recognized for the initial steps of the association process,

16

several routes were found in the proximity of the B endpoint. Thus, to extract a significant

17

sequence of intermediates, a suitable prioritization scheme was devised allowing us to identify

18

four relevant states (I1-4, Figure 3) highly involved in the total binding flux. The geometry of these

19

intermediates, along with their sequential order, can be considered as hallmarks (or “frames”

20

according to the PCVs jargon) that must be preserved when building PCVs. However, a functional

21

ensemble of frames (or frameset) suitable to the PVCs framework must not only preserve the

22

proper sequence of events, but it must also ensure a smooth and high-resolution transition along

23

the specific PCV controlling the progression along the path, i.e. the S variable.28 These

ACS Paragon Plus Environment

5

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 64

1

requirements translate into relatively small and homogeneous inter-frame distances, respectively,

2

that were not satisfied by the original ensemble of frames (Table S2). Nevertheless, once the

3

ordered sequence of intermediates is known, a better-suited frameset for PCVs can be easily

4

extracted by either re-processing the conventional-MD simulation trajectories or performing

5

focused biased-MD simulations (see Methods). Here, the latter strategy was undertaken,

6

employing steered-MD simulations to link the six parent stations into the final guess path.31,33

7

Retrospective validation of the obtained path was achieved by projecting the conventional-MD

8

simulation trajectories onto the PCVs space (see Figure 4). Interestingly, all the productive

9

trajectories (i.e. those leading to the crystallographic geometry) were found at low Z values,

10

providing initial clues about the suitability of the procedure. In fact, the Z PCV describes the

11

distance as compared to the guess path, which is ideally found at Z = 0 for all values of S.

12

Conversely, at high Z values alternative paths can be found. Most notably, all the non-productive

13

trajectories were mapped at approximately Z > 16 Å. This information turned out to be

14

instrumental for the subsequent resampling phase, as it provides an upper value for Z that can be

15

exploited to limit the exploration of the phase space on the most relevant region of the binding

16

pathway.

17 18

2.2 Detailed mechanistic interpretation of the drug-target recognition process.

19

The free energy surface (FES) reconstructed after 2 μs of metadynamics sampling is shown in

20

Figure 5, together with relevant configurations found along the binding pathway (M1-12, see also

21

Figure 6). By partitioning the protein-ligand bound/unbound region of the PCVs space at

22

approximately S = 20 (see below), a standard binding free energy of about −11.6 kcal/mol was

23

estimated,32,49 in remarkable agreement with experimental data (−12.2 kcal/mol).37 The error on

ACS Paragon Plus Environment

6

Page 7 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

the free energy profile, as estimated through block analysis (see Methods), amounts to

2

0.2 kcal/mol. This error can be interpreted as the uncertainty affecting the free energy

3

that arises by considering trajectory sections (blocks) of increasing size and averaging

4

over all such sections. The procedure is repeated until the uncertainty becomes

5

independent from the block size. Such condition is only met once the correlation between

6

the trajectory data points, which is by definition associated with sequential MD trajectory

7

frames, is lost. In Figure S1, the time evolution of the free energy difference between

8

distinct regions of the PCVs space is also shown.

9

By looking at the FES, we can distinguish four main regions showing distinct energetic features.

10

In particular, we observe a deep (S ≥ 60) and a shallow (40 > S ≥ 20) energy minima region

11

separated by a transition area where the main barrier of the process is located. Moreover, a high-

12

energy region is also found at S values lower than 20. In this latter portion of the PCVs space, the

13

ligand was fully solvated, explaining the almost structureless features of the FES. Conversely, at

14

the opposite side of the plot, configuration M12 perfectly matched the native crystallographic

15

structure (PDB ID: 3NYA).50 As Figure 4A shows, the MFEP (calculated using the MEPSA

16

algorithm)51 was located at low Z values along more than half of the space spanned by S (red solid

17

line). This implies that, at least from M3 to M12, the frameset devised by our integrated procedure

18

was highly reliable, as an optimal guess path would ideally match the MFEP at Z = 0 for all S

19

values. It is nonetheless not surprising that lower accuracy was found in the remaining regions

ACS Paragon Plus Environment

7

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 64

1

(MFEP at high Z values). Importantly, this suboptimal behavior was not related to the proposed

2

methodology. Indeed, since several competing routes can be followed by the ligand approaching

3

the receptor, the very concept of the path becomes rather arguable for the early stages of binding.

4

For the same reason, in Figure 5B the free energy profile along the MFEP is only shown for the

5

most relevant regions of the PCVs space, namely the portion corresponding to the M3-M12

6

transition.

7

Taken as a whole, the shallow minima portion of the FES entails an ensemble of states describing

8

the EC (see for example the M1 configuration in Figure 5A and 6B). Here, Alprenolol was still

9

partly solvated and established hydrogen bonds with Thr195 and Phe193 through its polar tail,

10

stationing with its principal axis parallel to ECL2 and ECL3. These two loops delimit a solvent-

11

exposed cavity on the protein surface, named the extracellular vestibule (EV), which is known to

12

accommodate ligands before their advancement towards the orthosteric site. In M3 (Figure 6C),

13

Alprenolol occupied the outermost region of the EV, enclosed between ECL3 and the lower

14

portion of ECL2 (ECL2b). Remarkably, this configuration, with alprenolol almost completely

15

desolvated and forming a hydrogen bond with Thr195 through the charged tail, corresponded to

16

pose 2 reported by Dror et al.37 It is important to note that the unbound to EV-bound state transition

17

was proposed as the major barrier for the whole process, and was mostly attributed to dehydration

18

of the binding partners.37 Our simulations do not support this picture, as no relevant barriers could

19

be observed from the unbound regions of the FES up to the M3 state. Nevertheless, as discussed

20

in the Methods section, we wish to highlight that our simulations were carried out using a different

21

force field, which may indeed be partly responsible for the observed discrepancies. Instead, the

22

transition state region of the FES roughly corresponds to configurational changes of alprenolol

23

within the EV, with the main barrier (≈ 6 kcal/mol) separating the outermost (M3) from the

ACS Paragon Plus Environment

8

Page 9 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

innermost (M6) regions of the vestibule. Here, by pivoting on the phenylethyl moiety and thus

2

accomplishing a half-turn rotation, the ligand underwent a substantial rearrangement to eventually

3

occupy the bottom of the EV (Figure 5C, 6D, and 6E). In configuration M6, alprenolol was

4

completely buried inside the protein and matched pose 3 reported by Dror et al.37

5

The final step of binding is represented by the ultimate transfer of alprenolol from the EV to the

6

orthosteric binding pocket. To do so, the ligand has to move through a narrow crevice between

7

ECL2 and helices H5-7, with the concurrent separation of two hydrophobic residues (Phe193 and

8

Tyr308). Notably, this step was proposed as the second energetic barrier encountered by the ligand

9

during the binding process.37 Interestingly, we did not observe any significant barrier at this stage.

10

Tyr308 was quite stable during the simulations, whereas Phe193 showed a wider degree of

11

flexibility. Indeed, as already observed in the original paper,37 the residues experienced frequent

12

conformational rearrangements regardless of the relative location of alprenolol. To further support

13

this observation, we monitored the distance between the center of mass of the phenyl rings of these

14

residues. In Figure 7A, the latter distance is shown as a function of the metadynamics simulation

15

time (red line) together with the S variable mapping the progress of the binding process (black

16

line). No apparent correlation between the two variables could be identified, as also pointed out

17

by the scatter plot reported in Figure 7B. Overall, three main states could be distinguished: a closed

18

state, where the side chain phenyl rings of these residues were in close contact (distance of about

19

4 Å), an open state, where a separation of about 8 Å was observed, and a wide-open state (distance

20

greater than 14 Å). An inter-residue distance of at least 8 Å was always found in the

21

correspondence of ligand ingress/egress, indicating that ligand passage required a sufficient

22

separation between Phe193 and Tyr308 (Figure 7C). Nevertheless, as shown by the red curve in

23

Figure 7A, the two residues were able to frequently interchange between such conformational

ACS Paragon Plus Environment

9

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 64

1

states over the entire course of the simulation regardless of the presence of alprenolol in their

2

surroundings. Thus, our simulations are indicating that no relevant barrier was associated with this

3

conformational rearrangement. Notably, configurations with a significantly larger inter-residue

4

separation, due to a further conformational rearrangement of Phe193 (observed in the simulation

5

time period 420-820 ns), were also captured in our simulations.

6

After overcoming the narrow passage through the Phe193 and Tyr308 gate (M8), a flip of the

7

polar tail resulted in a configuration displaying the hydroxyl group pointing upwards, in direction

8

of the previously occupied EV, and the charged nitrogen pointing downwards to the orthosteric

9

pocket (see Figure 6F and 6G). Maintaining this orientation and further advancing along the

10

crevice, a displacement of the tail of about 4 Å was observed in M9, making the positively charged

11

nitrogen approach Asp113 and Asn312, establishing a salt bridge with the former and a hydrogen

12

bond with the latter. Remarkably, this minimum represented the crossroad of two different routes

13

towards the bound state. Still focusing on the MFEP, M10 and M11 (Figure 6H) were

14

progressively visited, where the phenyl ring of alprenolol approached a hydrophobic area

15

comprising Val114 and Phe290. This configuration corresponded to pose 4 reported by Dror et

16

al.,37 indicated as the relevant intermediate prior to the native state. A flip of the hydroxyl towards

17

helices H6 and 7, in M11, was then necessary to adopt the final binding pose (M12), where a

18

network of interactions involved the polar tail and Asp113 and Asn312 and the ligand phenyl ring

19

accommodated in a hydrophobic region including Val114, Val117 and Phe290. The alternative,

20

higher-energy path from M9 went through M10’ and M12’ (dashed red line in Figure 4). The

21

former, quite spread and structurally diverse, was dominated by the flip of the ligand hydroxyl

22

towards H3 in order to interact with Asp113, in a diametrically opposed fashion compared to the

23

major path (Figure 8A). Additionally, the ligand phenyl explored different regions of the site,

ACS Paragon Plus Environment

10

Page 11 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

adopting a configuration closely resembling state 4’’ reported by Dror et al.,37 who suggested it as

2

an alternative metastable pose before adopting the final binding mode. In M12’ (Figure 8B), the

3

phenyl moiety visited different regions of the orthosteric site around the volume occupied in the

4

target M12 pose. Noteworthy was the occupancy of a small hydrophobic cavity including Phe289,

5

Phe290, Phe208, and Trp286, exposed after displacement of Phe290, to which the energy barrier

6

between M10’ and M12’ might be adduced.

7 8

2.3 The integrated MSM-Path Metadynamics approach into perspective.

9

In the last decade, a plethora of diverse computational methods aimed at addressing the

10

timescales problem have been proposed and applied to pharmaceutically relevant issues of

11

different complexity.14,22 Accordingly, it is possible today to take advantage of a rich pool of

12

methods purposely developed to assist the drug design process with tailored tradeoffs between

13

accuracy and efficiency. Some of these methods are designed to provide on- and off-rates of drug

14

binding together with the associated free energy at high accuracy. For example, building on an

15

MSM framework, the adaptive sampling scheme known as High Throughput MD (HTMD) is able

16

to reconstruct the whole drug binding process in a totally unsupervised fashion.39 Through an

17

iterative learning and sampling procedure, HTMD has proven to achieve a converged binding

18

affinity one order of magnitude faster than conventional MD simulations combined with MSM.

19

Another adaptive scheme is the WExplore method developed by Dickson and coworkers.52

20

Relying on a weighed ensemble (WE) algorithm,53 replicas of the system are evolved in parallel

21

and assigned to a statistical weight. Periodically, these replicas are cloned or merged based on the

22

extent of exploration of configurational space in a way to enhance the heterogeneity in under-

23

sampled regions. This cloning and merging operations are governed by the trajectories’ statistical

ACS Paragon Plus Environment

11

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 64

1

weight, which is kept constant through the process so as to maintain an unbiased ensemble of

2

states. Over conventional WE, WExplore implements a hierarchical definition of sampling regions

3

based on dynamically defined Voronoi tessellations. This approach is most suited to study drug

4

unbinding, for which the protein-bound state is already known, and to provide reliable estimates

5

of drug residence time with a great saving of computational time.19–21 It has been recently argued

6

that, in general, adaptive sampling schemes may offer little advantage over conventional MD

7

simulations in getting faster drug binding processes in a totally unsupervised regime.24 However,

8

it has also been recognized that these methods hold great potential in providing an accurate

9

description of drug binding pathways as well as the ability to locate allosteric sites.24 A convenient

10

tradeoff between accuracy and efficiency is certainly provided by SEEKR,16,17 a hybrid

11

MD/Brownian Dynamics/milestoning methodology15 suited to efficiently reconstruct on- and off-

12

rates of drug binding processes through a multiscale approach. Conversely, concerning methods

13

more focused on the computational speed, we mention here the scaled MD approach54 and the

14

closely related τ-RAMD55 to get fast estimates of drug residence times and unbinding

15

pathways.56,57 Finally, when one is more interested in using MD simulation-based methods to

16

overcome limitations of conventional docking methods (dynamic docking), Supervised MD

17

(SuMD) simulations,58,59 the MD-binding protocol,60 as well as the “binding” variant of scaled

18

MD simulations,61 can be considered.

19

Where does the integrated MSM-Path Metadynamics approach fit within the multitude of

20

emerging methods? Even though a rigorous assessment of computational efficiency is out of the

21

scope of the present work, our results indicate that the proposed methodology can be ideally found

22

halfway between brute force MD-sampling and dynamic docking methods. Indeed, once a

23

functional path is obtained, metadynamics simulation can reach reasonable estimates of binding

ACS Paragon Plus Environment

12

Page 13 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

free energies (and related free energy profiles) in the order of microsecond long simulations. It is

2

however also true that much of the efficiency of the whole scheme depends on the ability of the

3

initial pool of trajectories in capturing the binding process with reasonable accuracy. This can only

4

be achieved once multiple protein-ligand recognition events are sampled via conventional MD

5

simulations. While such initial requirement unavoidably translates into substantial computational

6

costs, it is nevertheless difficult to provide an a priori estimate, and it is also expected to vary

7

considerably depending on the investigated system. In such a scenario, we note that our strategy

8

would probably be more suited to characterize the binding process of fragments and/or low-affinity

9

leads. Indeed, these compounds usually show faster association kinetics over drug-like

10

compounds, making the association process easier to observe with conventional computational

11

resources.62

12

While much of the above discussion was focused on sampling schemes and limitations thereof,

13

one should always be aware that MD simulations are also affected by force fields inaccuracies. In

14

this respect, we already mentioned that the choice of different force fields might be responsible for

15

the observed discrepancies between our results and those previously reported by Dror et al.37 Thus,

16

notwithstanding recent impressive advances in force fields development,63 one should always be

17

cautious in the interpretation of MD simulation results, especially when the goal is to obtain

18

quantitative estimates of free energy profiles and/or when electronic polarization effects are known

19

to play a major role in the process under study. Despite accounting for such contributions during

20

MD simulations is not straightforward, a QM/MM approach for refining the free energy profiles

21

within the framework of path collective variables has been recently proposed by Haldar et al.64 It

22

might be interesting to consider such methodology in future work especially to assess the relative

ACS Paragon Plus Environment

13

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 64

1

merit of the individual force fields and, when possible, to foster their improvement and

2

development.

3 4

3. CONCLUSIONS

5

In summary, through the protocol presented herein, we achieved a highly informative description

6

of alprenolol association to the 2-AR, unveiling mechanistic features of the process with an

7

unprecedented level of detail. Additionally, by relying on the reconstructed energy profile, we

8

were able to estimate a related binding free energy of -11.6 kcal/mol, in very good agreement with

9

experimental data (-12.2 kcal/mol). In this way, we have demonstrated how MSM together with

10

PCV can be integrated to tackle open issues in biophysics and, more generally, in drug design such

11

as drug-target binding. As such, our study shows that the combination of rigorous computational

12

methods (MSM and PCV) can provide highly reliable structural and thermodynamic predictions,

13

paving the way for innovative methods and protocols in computational drug design and discovery.

14 15

4. METHODS

16

4.1 Construction of the Markov State Model

17

Constructing an MSM can be roughly summarized in two major stages. First, the initial

18

trajectory set, in its conventional view as a series of structures over time, is converted into the so-

19

called discrete trajectories, that is a series of microstates over time. To this end, the high

20

dimensional space of the system configurations has to be mapped into a lower dimensional space

21

defined through the identification of relevant variables that are able to discriminate different states

22

along the binding process. Such procedure is usually referred to as state decomposition. Secondly,

23

jumps of fixed-size in time (or lag time) are carried out over these discrete trajectories, and the

ACS Paragon Plus Environment

14

Page 15 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

observed transitions in the microstate space are stored in a count matrix from which a transition

2

probability matrix is then obtained.22

3

In this context, major challenges are found in the exhaustiveness of the initial trajectory pool

4

and in the state decomposition procedure. As for the first aspect, it is essential that full binding

5

events are spontaneously observed, so as to ensure that all the degrees of freedom involved in the

6

process are reliably captured. On the one hand, a strength of MSMs is the possibility of aggregating

7

multiple trajectory data, so that individual portions of the whole binding process can be gathered

8

from independent trajectories. On the other hand, this may lead to disregarding some slow

9

structural rearrangement along orthogonal degrees of freedom that can possibly take place in the

10

different trajectories. As a consequence, when incorporating in the same microstate configurations

11

harvested from different trajectories, such configurations may in reality be kinetically disjoint and

12

thus unlikely to efficiently interconvert. Indeed, and moving to the second aspect, the aim is to

13

achieve a kinetically relevant definition of microstates through geometric clustering. While no

14

simple means are available to accomplish this task,65 such condition can be met to a certain degree

15

of approximation by grouping geometrically similar configurations, which are likely to rapidly

16

interconvert, into the same microstate. To this end, it is important to uniquely define each

17

microstate through careful identification of appropriate variables on which performing the

18

clustering procedure, so as to distinguish and consider all the relevant degrees of freedom involved

19

in the process under study.

20

Once the above steps have been carried out, a standard procedure to identify the optimal choice

21

of lag times to build the model, and then ensuring the condition of memory loss at the basis of the

22

Markovian assumption, is to monitor the relaxation time scales of the system dynamics as a

23

function of the lag time. When no clear-cut convergence of the relaxation time scales is observed,

ACS Paragon Plus Environment

15

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 64

1

additional simulations may be required before building the model to improve the statistical

2

uncertainty of undersampled regions.44

3 4

Below, we go through the details of the above aspects in our specific application to alprenolol2-AR binding.

5 6

The trajectory set. To build the MSM, we took advantage of the available simulations produced

7

by Dror et al. via unbiased sampling,37 reported in Table S1. Notably, in only 6 out of 12 runs the

8

authors successfully reproduced the binding pose observed in the crystal structure of the  2-AR in

9

complex with alprenolol (PDB ID: 3NYA)50 with an RMSD < 0.8 Å. Nevertheless, the ligand was

10

able to occupy the orthosteric site on the  2-AR in all of the 12 runs. In other words, in half of the

11

considered simulations, alprenolol was able to access the binding site but remained stuck in in non-

12

native binding modes. We aggregated the unbiased trajectories to build the MSM and in turn

13

extract information about ligand access to the orthosteric site. Remarkably, in the process of

14

constructing the MSM, one trajectory was purposely discarded from the dataset because of striking

15

differences found in the association pathway. This behaviour was already recognized in the

16

original paper by Dror et al.37

17

All of the unbiased simulations reported in the original paper by Dror et al. were started in the

18

unbound state, placing 10 replicas of the ligand in the bulk.37 Notably, a significant amount of

19

simulation time was spent by the ligands to wander in the solvent without accomplishing durable

20

contacts with the solvent-exposed surface of the protein.37 As such, with the aim to reduce as much

21

as possible the amount of data to process, and considering our final goal of achieving a thorough

22

understanding about the behaviour of the ligand from the surface to the orthosteric site, we

23

discarded these initial trajectory portions. To this end, for each trajectory, we defined contacts

ACS Paragon Plus Environment

16

Page 17 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

between alprenolol and  2-AR to have occurred when the distance between heavy atoms from the

2

two species was below a cut-off value of 5 Å in at least one frame. We then monitored the distance

3

values of these contacts over the trajectories and discarded the frames in which the minimum value

4

was above a 30 Å cut-off. By applying this procedure, a total number of 296,659 frames from the

5

initial unbiased trajectory set was selected. All this procedure was carried out through in-house

6

python routines that are available upon request.

7 8

State decomposition and choice of the clustering metric. When constructing an MSM, the

9

first step is the selection of appropriate variables to map the configurations of the system into a

10

low-dimensionality space in which subsequently identifying the microstates of the kinetic model

11

by cluster analysis (state decomposition).44,65,66 By the term appropriate, we mean variables able

12

not only to recognize distinct states of the system along the process under investigation but also to

13

describe the dynamics thereof.44,65 The goal, in our case, was to monitor the advancement of

14

alprenolol along its route from the surface of the protein to the orthosteric site.

15

While using the RMSD of the ligand after alignment on the protein’s frame would be an obvious

16

choice, inspired by previous work67 we selected the minimum distance between protein and ligand

17

heavy atoms as variables to build the model. Another aspect to consider when choosing the

18

variables is the trade-off between their effectiveness and their total amount. As a clustering

19

procedure is applied, the larger the number of variables, the highest the computational effort

20

required. Therefore, we only considered proteins’ -carbons and the heavy atoms for the ligand.

21

Moreover, as the association process involved directly only a fraction of the protein residues, we

22

restricted our selection to the residues belonging to the bulk-exposed surface, the extracellular

23

vestibule, the narrow passage leading to the binding pocket, and the residues comprised in the

ACS Paragon Plus Environment

17

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 64

1

orthosteric site. As a result, 64 protein residues were included through this selection (Figure S2).

2

Similarly, we only considered 5 heavy atoms that sufficed in describing major rearrangements in

3

the ligand’s configurations (Figure S2): the two oxygen atoms from the phenol ether and the

4

hydroxyl groups, the nitrogen present in the secondary amine group, which is positively charged

5

at the physiological pH, and two carbon atoms. The two oxygens and the nitrogen allowed the

6

ligand to interact with protein residues by means of hydrogen bonds and salt bridges. Differently,

7

the two carbon atoms were required to describe the phenyl ring flipping, which could occur

8

regardless of the position of the other chemical groups. As a result, this led to a set of variables

9

consisting of 320 minimum distances between these two groups of atoms.

10

Once appropriate variables are chosen, and their values computed for each trajectory frame, the

11

clustering procedure can be carried out. The typical procedure is to devise a kinetically relevant

12

geometric clustering, in such a way that similar configurations in terms of geometry are likely to

13

rapidly interconvert. In order to cluster our data, we employed the K-means clustering algorithm,68

14

which has gained great popularity in the MSM community and has been employed with

15

satisfactory results in several previous works.69–71 This clustering algorithm tends to place more

16

clusters in more densely sampled regions, which avoids creating too many groups with small count

17

and thus favouring a more accurate determination of the transition probabilities. By construction,

18

the K-means algorithm requires the user to specify the number of output clusters. Thus, an

19

increasing number of clusters was empirically tested: 50, 100, 150, 200, 300, and 500. We

20

performed the cluster analysis using the 320 protein-ligand minimum distances as input variables.

21

For an easier visualization, we projected the resulting clusters in a more intuitive 2D space. Thus,

22

for each of the obtained clusters, we calculated the RMSD of the ligand with respect to the native

23

state (PDB ID: 3NYA)50 for all of the configurations contained in that cluster and took the average.

ACS Paragon Plus Environment

18

Page 19 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Similarly, the average distance of the ligand center-of-mass (COM) between the configurations

2

included in a cluster and the crystallographic complex was employed as a second dimension. The

3

result for the different number of clusters is displayed in Figure S3. As shown by the figure, when

4

increasing the number of clusters from 50 to 100 and 150, the inclusion of sampled points was

5

significantly improved. Conversely, a further increase to 200, 300 and 500 did not broaden

6

significantly the distributions of the clusters. Aiming at maintaining the number of clusters as more

7

limited as possible, we thus used 150 clusters to build our model.

8 9

Lag time selection. After defining the microstates, the trajectories are scanned to identify a

10

series of microstates over time, obtaining the so-called discrete trajectories.44,65,66,72 Then, jumps

11

of fixed size are performed along these discrete trajectories, and the transitions observed through

12

such jumps in the microstate space are recorded and stored in a count matrix that is subsequently

13

converted into a transition probability matrix. The size of jumps, referred to as the lag time of the

14

model, is a critical parameter of the whole procedure, as different lag times lead to different count

15

matrices. Under the Markovian assumption, systems are memoryless, that is the probability of

16

being in the current state depends only on the previous state.44,65,66,72 As such, transition counts

17

can be considered statistically independent. Accordingly, the Markov time is defined as the

18

smallest lag time that provides a Markovian behaviour.65 Assessing the lag time dependence of the

19

relaxation time scales implied in the system dynamics has been shown to be a useful approach to

20

determine appropriate lag times to ensure memory loss.65,66 To determine the appropriate lag time

21

for the construction of the MSM, we monitored the dependence of the time scales of the system

22

on increasing values of lag time (Implied TimeScales, or ITS, plot). To assess the reproducibility

23

of the entire procedure, the whole workflow, beginning with the cluster analysis, was repeated

ACS Paragon Plus Environment

19

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 64

1

three times. The corresponding three ITS plots were thus produced and compared. As a result, very

2

similar outcomes were obtained (Figure S4). As the Figure shows, the relaxation time scales level

3

off at a lag time corresponding to about 600 steps. Considering the trajectory saving frequency of

4

0.180 ns, our choice of 600 steps corresponded to a lag time of about 100 ns. We initially attempted

5

to aggregate all of the 12 productive and non-productive trajectories. However, this did not allow

6

observing clear convergence of the relaxation time scales. This behaviour would typically indicate

7

the presence of states not adequately sampled. Thus, we run through the simulations to identify

8

possible sources and noted that, while in 11 out of the 12 simulations the binding pathways were

9

remarkably similar, one single trajectory displayed the ligand getting access to the pocket through

10

a significantly different route. Therefore, we discarded the trajectory and repeated the analysis,

11

eventually obtaining the converged plots reported in Figure S4.

12 13

4.2 Flux analysis and prioritization scheme

14

MSM provides a discrete-state kinetic model of the process investigated that can be exploited to

15

obtain relevant thermodynamic and kinetic information. Often, however, the model still retains a

16

high level of complexity, such as it might be desirable to derive a coarse grain description thereof

17

providing the same quantitative information in a more compact fashion.44,65 Herein, this coarse-

18

graining step (also known as microstate lumping phase) was intentionally skipped, since a

19

microscopic description of the association process was required by the following PCVs

20

construction. Nevertheless, in order to identify the most relevant transitions between the bound

21

and unbound state of the protein-ligand complex, the model was further analysed with Transition-

22

Path Theory (TPT).46,73 Provided two endpoints (the encounter complex and the native pose in this

23

work), TPT aims at describing all the reactive trajectories in the microstate space provided by

ACS Paragon Plus Environment

20

Page 21 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

MSM in terms of their probability distribution and fluxes.48 A central concept in TPT is the

2

committor probability of each microstate, 𝑞𝑖+ , that is the probability for the ith state to reach state

3

B rather than A.46,47 The forward committor probability can be thus defined as:47,48

4 5

𝑞𝑖+ = ∑𝑗 ∈ 𝐵𝑇𝑖𝑗 + ∑𝑗 ∈ 𝐴 ∪ 𝐵𝑇𝑖𝑗𝑞𝑗+ ,

(1)

6 7

where 𝑇𝑖𝑗 is the transition probability between two given microstates, and 𝐴 ∪ 𝐵 is the

8

complement of the union of A and B. In equilibrium conditions, if the detailed balance is satisfied,

9

the backward committor probability is simply given by:

10 11

𝑞𝑖― = 1 ― 𝑞𝑖+ .

(2)

12 13 14

The effective flux, that is the probability to visit microstate i and j consecutively, is thus defined as:

15 16

𝑓𝑖𝑗 = 𝜌𝑖 𝑞𝑖― 𝑇𝑖𝑗𝑞𝑗+ ,

(3)

17 18 19

in which 𝜌𝑖 is the Boltzmann probability for state i. Then, the net flux from A to B can also be computed as:

20 21

𝑓𝑖𝑗+ = max (𝑓𝑖𝑗 ― 𝑓𝑗𝑖).

(4)

22 23

While the reaction rates can be in principle estimated from the total flux,47,48 we highlight here

24

that we obtained the kinetic model in a data-poor regime, hence attempting to extract any

ACS Paragon Plus Environment

21

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 64

1

quantitative observables would be unwise. Rather, we exploited TPT to obtain the most

2

representative ordered sequence of intermediates between the A and B endpoints. The network of

3

fluxes was visualized with Graphviz,74 as displayed in Figure 3. As reported in the Results and

4

Discussion section, a dominant pathway could only be detected at the initial stages of binding.

5

Hence, rather than by visual inspection, we extracted the intermediates using a less subjective

6

procedure. Following Noé et al.,47 we first decomposed the total flux in circle-free pathways. Each

7

pathway was assigned a probability based on its flux (𝑓𝐴→𝐵 ):47 𝑖

8 9

𝑓𝐴→𝐵 𝑖

= ∑ 𝑓𝐴→𝐵 𝑝𝐴→𝐵 𝑖

(5)

𝑗 𝑗

10 11

Then, each state was assigned a score based on the number of times it was observed in the

12

ensemble of possible pathways, weighted by the relative probability of the associated pathway.

13

Through this analysis, the top-ranking states I1, I2, I3, and I4 were identified together with their

14

ordered sequence.

15 16

4.3 Construction of the guess path for PCVs

17

As already pointed out, the PCVs formalism requires a frameset representing a guess path for

18

the investigated event.28 A crucial requirement for the frameset is an equal spacing in terms of

19

RMSD between neighbouring snapshots. As such, including more frames results in smaller

20

distances and increases the resolution of the mechanistic interpretation. From this standpoint, the

21

few states obtained from TPT were not well-suited to provide functional PCVs. Specifically, an

22

RMSD of 5.7 Å, 8.6 Å, 7.7 Å, 1.6 Å, and 2.8 Å was obtained for the five transitions (see also Table

23

S2). First, such large distances between some of the states would inevitably affect the resolution

ACS Paragon Plus Environment

22

Page 23 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

of the FES. Moreover, these states would not guarantee the equal spacing required to ensure a

2

smooth progression of the collective variable. Consequently, we devised a multi-step procedure in

3

which first we enriched the number of configurations between each pair of endpoints and

4

subsequently selected equally spaced configurations. As for the first step, we performed a series

5

of steered MD simulations75,76 starting in each one of the identified states. In order to plan the

6

steered MD simulations so as to start in one of the stations identified through the MSM and end in

7

the subsequent one, a series of runs were set using the RSMD as CV and targeting it to zero. In

8

addition, one more run was carried out from the first point (state A) increasing the coordination of

9

the ligand by water molecules in order to obtain additional points at a fully solvated state. We then

10

reverted this first short trajectory and merged it with the other steered MD simulation runs,

11

resulting in a 30 ns-long binding trajectory. For all the 6 runs, 5 ns-long each, force constants

12

ranging from 20 to 50 kcal/(mol Å2) were applied. We monitored the work required, and multiple

13

realizations were produced in order to ensure reproducibility of the outcomes. As we expected,

14

considering the relatively limited configurational space allowed to the ligand within the protein,

15

all the runs produced extremely similar results. Sample work profiles from such runs are shown in

16

Figure S5A. Conversely, as reported by Figure S5B, increasing the solvation of the ligand while

17

detaching from the protein opened the way to several distinct states, which were nevertheless less

18

relevant in light of the purposes of our study. Thus, in line with Favia et al.,31 we selected the run

19

in which the work was lower. Once the enriched trajectory, consisting of 30 ns of the 6 aggregated

20

steered MD simulation runs, was obtained, we proceeded to the selection of equally spaced frames.

21

We aimed at achieving an inter-frame distance of about 1 Å, which would allow distinguishing

22

relatively close configurations along the binding pathway, and in turn, achieving a reasonable

23

resolution of the outcoming FES. This step was non-trivial, since essentially translated into

ACS Paragon Plus Environment

23

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 64

1

iterating over 13,000 MD simulation snapshots to select a more contained number of frames while

2

achieving at the same time the desired spacing. To accomplish this task, we took advantage of in-

3

house scripts specifically devised for this purpose and involving inter-frames interpolation. This

4

procedure allowed us to select 80 points displaying an average inter-frame separation of 1.25 Å.

5

Finally, the guess path was parameterized using the S and Z variables developed by Branduardi et

6

al.:28

7 𝑃

8

𝑆(𝐱) =

∑𝑖 = 1𝑖 𝑒 𝑃

∑𝑖 = 1𝑒

2 ―𝜆‖𝐱 ― 𝐱𝑖‖

(6)

2 ―𝜆‖𝐱 ― 𝐱𝑖‖

9 10

and

11 12 13

𝑃

‖𝐱 ― 𝐱𝑖‖2

𝑍(𝐱) = ― 𝜆 ―1ln (∑𝑖 = 1𝑒 ―𝜆 (7)

),

14 15

in which x represents the current configuration of the system, and ‖𝐱 ― 𝐱𝑖‖ is the distance

16

between the current configuration and the ith structure of the frameset. Here, we used the RMSD

17

as a metric. In particular, the RMSD was calculated on all alprenolol’s heavy atoms, made

18

exception for the two methyl carbons of the terminal N-isopropyl moiety, after optimal alignment

19

over the alpha carbons from protein residues Cys77, Ala78, Asp79, Leu80, Val81, Met82, Gly83,

20

Leu84, Cys116, Val117, Thr118, Ala 119, Ser120, Ile121, Trp122, Thr123, Tyr209, Val210,

21

Pro211, Leu212, Ile 214, Met215, Phe282, Thr283 and Trp286. These alpha carbons belonged to

22

structured motifs and thus displayed small coordinate displacement during the simulations;

23

specifically, such condition was assessed by measuring the root mean square fluctuation of all the

ACS Paragon Plus Environment

24

Page 25 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

protein’s alpha carbons in preliminary sample MD simulation runs. In Eq. 6 and 7,  is a parameter

2

which controls the smoothness of the advancement along the path and was set to 1.48 Å-2.

3

All the data and PLUMED input files required to reproduce the results reported in this paper are

4

available on PLUMED-NEST (www.plumed-nest.org), the public repository of the PLUMED

5

consortium,77 as plumID:19.063.

6 7

4.4 Free energy profile and binding free energy

8

All the simulations were performed starting from selected configurations extracted from the

9

trajectories computed by Dror et al.37 Since, the parameters for protein and ligand could not be

10

transferred from the original topology, the entire system was re-parameterized. Thus, protein and

11

lipids were modelled according to the Amber ff14,78 which includes Lipid 14.79 As for alprenolol,

12

this was parameterized according to the general Amber force field (GAFF).80 Partial charges were

13

assigned to the ligand through the widely employed RESP procedure.81 In particular, geometrical

14

optimization and further calculation of the electrostatic potential were carried out via quantum

15

mechanics using the 6-31G* basis set at the Hartree-Fock level of theory; the NWChem software

16

was employed for this stage.82 The TIP3P model was used for the waters83 and an appropriate

17

number of ions, parameterized as in Joung and Cheatham,84 were added to neutralize the system.

18

All of the simulations were performed using the 4.6.7 version of the GROMACS MD simulation

19

engine85 patched with the 2.1.1 version of PLUMED.86

20

Production runs were performed in the NPT statistical ensemble using a 2 fs time-step. A cut-

21

off of 12 Å was used for non-bonded interactions, while long-range electrostatics was treated with

22

the Particle-Mesh Ewald scheme employing a grid spacing of 1.6 Å and a fourth order spline.87

23

The nominal temperature of 310 K was controlled with the Nosé-Hoover thermostat88,89 separately

ACS Paragon Plus Environment

25

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 64

1

coupling solutes, membrane and solvent with a time constant of 0.5 ps. The Parrinello-Rahman90,91

2

barostat was used to control the pressure (1 bar) using a semi-isotropic coupling and a time constant

3

of 5 ps. Bond lengths for chemical bonds involving hydrogens were restrained to their equilibrium

4

values with the LINCS algorithm.92

5

Well-tempered metadynamics27 was performed for a total time of about 2 μs. Gaussians with a

6

nominal height of 0.2 kcal/mol were used together with a bias factor of 15 units. The width of the

7

Gaussians was set to 0.1 and 0.01 nm2 along S and Z, respectively, and the space along the Z

8

dimension was limited to 0.16 nm2. The Gaussians deposition time was set to 500 molecular

9

dynamics steps. The statistical uncertainty was assessed through block analysis using PLUMED.93

10

This procedure allows providing an estimate of the error in the free energy as a function of pre-

11

defined CVs. Specifically, using the MD simulation trajectory obtained from the metadynamics

12

simulation, the free energy is computed in the CV space for independent and sequential blocks of

13

trajectory frames of fixed length. As a first step, each trajectory frame is assigned a weight

14

according to an umbrella-sampling-like reweighting approach.94 Subsequently, a binning of the

15

CV space is carried out, where each bin is assigned a free energy value that is computed from the

16

frame weights. Finally, for each bin, the average free energy over all the blocks can be determined

17

along with the associated error. Notably, the error is estimated from the fluctuations of the weights

18

over the block ensemble. The procedure was then repeated using different block sizes and the

19

corresponding average for the free energy error is computed from the errors associated with each

20

bin. As a typical behavior, an increase in the average error is observed as the block size becomes

21

larger. However, such dependence ceases once the correlation between the data, which is an

22

intrinsic attribute of MD simulation trajectories, is lost. As a result, the average error reaches a

23

plateau region, which corresponds to the error associated with the computed free energy. The

ACS Paragon Plus Environment

26

Page 27 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

described protocol was applied to our metadynamics trajectory data employing a 200 x 40 binning

2

scheme along the S and Z CVs, respectively. Blocks of increasing size, up to 7000 frames, were

3

considered. The average error as a function of the block size is reported in Figure S6. As shown in

4

the plot, the error tends to a plateau region when the block size includes about 6000 frames

5

(corresponding of a simulation time of 12 ns). Thus, the corresponding average error value

6

provided a reasonable estimate of the uncertainty associated with the free energy computed from

7

our metadynamics simulations.

8

The standard binding free energy of Alprenolol to the 2-AR was estimated following the

9

approach proposed by Saladino et al.32 In particular, by partitioning the bound/unbound state at

10

approximately S = 20, the corrected binding free energy can be formulated as follows:49

11 12

Δ𝐺°𝑏 = Δ𝐹meta +Δ𝐹°corr≅ ― 𝑅𝑇ln

∫sited𝑆 d𝑍 e ∫bulkd𝑆 d𝑍 e



𝐹(𝑆,𝑍) 𝑅𝑇

𝐹(𝑆,𝑍) ― 𝑅𝑇

― 𝑅𝑇ln

𝑉meta bulk 𝑉°

(8)

13 14

The first term of Eq. 8 represents the probability ratio between the bound and unbound state of

15

the ligand as reconstructed by well-tempered metadynamics, while the second term is a correction

16

taking into account the fact that in the unbound state only a fraction of the volume of the box was

17

accessible to the ligand. This was due to the finite length of S and the confinement along Z, and it

18

was numerically estimated to be on the order of about 524 Å3. The second term of Eq. 9 also

19

involves the reference of standard state volume (or 1 M concentration), equal to 1660 Å3. Thus,

20

considering the explicit contribution from metadynamics simulations and the entropic cost due to

21

volumetric confinement in the unbound state, a value of – 11.6 kcal/mol was estimated (Δ𝐹meta =

ACS Paragon Plus Environment

27

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 64

1

– 12.3 kcal/mol and Δ𝐹°corr = 0.7 kcal/mol), in very good agreement with the experimental binding

2

free energy (−12.2 kcal/mol).37

3 4

4.5 Caveats and recommendations for the use of the MSM-Path Metadynamics approach

5

We conclude this section by briefly summarizing some practical suggestions for future

6

applications of the presented methodology. First and foremost, it should be ensured that at least

7

few spontaneous and complete binding events are observed in the initial trajectory set

8

obtained via conventional MD simulations. Alternatively, several disjoint trajectories can

9

be aggregated and analysed as a whole, provided that the entire process can be

10

reconstructed with sufficient accuracy. While as a proof-of-concept we employed here

11

trajectories that were already available, one could consider limiting the sampling of the

12

unbound state through volumetric confinement closer to the binding site.61 This strategy

13

would improve the sampling efficiency of conventional MD simulations, and it is not

14

expected to be critical considering that subsequent refinement is then achieved through

15

the metadynamics resampling. In case spontaneous binding events are recorded, care

16

must be taken in whether to include all the obtained information in the construction of

17

PCVs. Indeed, while greater sampling of overlapping regions of configurational space

ACS Paragon Plus Environment

28

Page 29 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

certainly improves the statistical significance of the PCVs space under construction, the

2

sampling of highly dissimilar binding pathways might introduce additional uncertainty in

3

the MSM rather than alleviate it. In the present work, the individual trajectory responsible

4

for this behaviour was spotted by visual inspection and previous knowledge,37 but re-

5

computing the ITC plots in a leave-one-out fashion (i.e. discarding each time a different

6

trajectory set) would be a viable option to identify an undesirable outlier. Alternatively,

7

one may also take advantage of recently proposed trajectory analysis tools aimed at

8

automatically classifying the similarity of (un)binding pathways.57 In any case, particular

9

care should be addressed to the selection and assessment of appropriate variables that

10

are used to define the microstates of the MSM. This can for instance be achieved by

11

analysing the outcome of the clustering procedure, by ensuring that no relevant structural

12

changes are taking place within the same cluster in orthogonal degrees of freedom. In

13

general, the number of clusters for the microstate definition should be taken as low as

14

possible to avoid microstates with too few counts, condition which is necessary to build a

15

reliable transition probability matrix,65 but sufficiently large to geometrically discriminate

16

between different microstates. Finally, in the construction of the guess path, whose

ACS Paragon Plus Environment

29

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 64

1

essential requirement is that the inter-frame separation is as constant as possible, an

2

RMSD difference in the order of 1 Å would be desirable to achieve a reasonable resolution

3

and thus correctly map configurations along the S variable. As for the Z PCV, sampling

4

of values larger than16 Å2 are discouraged, since this would correspond to configurations

5

of about 4 Å RMSD difference from the corresponding S value, leading to highly

6

degenerate states for the same point in the PCV space.

7

8

FIGURES

9

Figure 1. The MSM-Path Metadynamics approach described in this work.

10

Figure 2. Orthogonal views of the the 2-AR embedded in a lipid bilayer. The protein is

11

shown in cartoons color-coded according to the -helix regions. In the top panel, the

12

alprenolol binding is depicted tracing its center-of-mass (orange spheres) along the whole

13

set of reactive trajectories.

ACS Paragon Plus Environment

30

Page 31 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Figure 3. Network of fluxes for the alprenolol binding to the 2-AR (A) Network of fluxes

2

calculated through TPT. Each node represents a single microstate and its size is

3

proportional to the connections. Conversely, the size of the edges is proportional to the

4

flux. (B) A close-up view of the most relevant regions of the network, showing the four

5

intermediate states mostly involved in the transition from the EC to bound state (endpoints

6

A and B, respectively).

7

Figure 4. Retrospective validation of the MSM-derived path though projection of the

8

conventional-MD simulation trajectories on the PCVs space. Each color of the grayscale,

9

from black to light grey (labels A0-C9 in the upper left corner of the plot), identifies one of

10

the 11 different trajectories used to build the MSM. The larger colored dots, labelled A,

11

I1-4, and B, are the projection of the 6 states used as templates to construct the frameset.

12

These are consistent with the labels used in Figure 2. The corresponding configurations

13

are depicted right below the plot, labeled using the same color code. Conversely, right

14

above the plot, sample structures are shown for dense regions in the PCVs space

15

depicting either alternative binding pathways or non-productive trajectories.

ACS Paragon Plus Environment

31

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 64

1

Figure 5. Mechanistic interpretation of the drug binding process. (A) Free energy surface

2

for the alprenolol binding to the 2-AR as reconstructed by metadynamics along the guess

3

path. Iso-lines are plotted each 2 kcal/mol. The main metastable states are reported

4

together with selected configurations. The minimum free energy pathway is displayed in

5

red (solid line) along with an alternative route (dashed line). (B) Free energy profile along

6

the MFEP (black line) together with the associated statistical uncertainty (gray bars). (C)

7

Focus on the alprenolol rearrangement upon the M3 to M6 transition (red to blue color

8

scale, nitrogen atom highlighted in yellow).

9

Figure 6. Relevant states along the Minimum Free Energy Path. (A) One representative

10

configuration per relevant minima. The color gradient goes from dark red (EC formed) to

11

blue (crystal pose). (B) Representative configuration for the M1 state. (C) Representative

12

configuration for the M3 state. (D) Half-turn rotation along the ligand main axis to move

13

from M3 (darker pink) to M4 (lighter pink). (E) Representative configuration for the M5

14

and M6 states. (F) Representative configuration for the M7 state. (G) The 4 Å

15

displacement to go from M8 (white) to M9 (azure). (H) Representative configuration for

ACS Paragon Plus Environment

32

Page 33 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

the M11 state. In panels (B) to (H), ligand color code is consistent with panel (A); ligand

2

polar atoms are highlighted as spheres (blue for nitrogen, red for oxygen) and polar

3

hydrogens are also shown.

4

Figure 7. Assessing the response of residues Phe193 and Tyr308 to alprenolol passage

5

through the narrow crevice. (A) Phenyl-phenyl side-chains’ distance between Phe193 and

6

Tyr308 plotted against simulation time (red line, left y-axis). The instantaneous value of

7

S during metadynamics simulation is also shown (black line, right y-axis). (B) Scatter plot

8

of S plotted against the phenyl-phenyl side-chains’ distance. (C) Representative

9

configurations extracted from the metadynamics simulation showing the phenyl-phenyl

10

gate in the closed, open, and wide-open states (left, center, and right, respectively).

11

Figure 8. The alternative binding path. Configurations sampled in the alternative path,

12

namely M10’ (A) and M12’ (B). alprenolol is colored in yellow.

13 14

ASSOCIATED CONTENT

ACS Paragon Plus Environment

33

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 64

1

Supporting Information. The following files are available free of charge.

2

Supporting Figures: time evolution of free energy differences along metadynamics run;

3

details on the protein residues and ligand atoms used to parameterize PCVs; evaluation

4

of the significance of clusters used to build the MSM; ITS plots; outcome of steered MD

5

simulations; block analysis. Supporting Tables: summary of the trajectory set; RMSD

6

matrix of the intermediates identified through TPT.

7

8

AUTHOR INFORMATION

9

Corresponding Author

10

*E-mail: [email protected]

11

*E-mail: [email protected]

12

Present Addresses

13

‡Present Address: Scuola Internazionale Superiore di Studi Avanzati, via Bonomea 265,

14

Trieste, Italy

ACS Paragon Plus Environment

34

Page 35 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Author Contributions

2

The manuscript was written through contributions of all authors. All authors have given

3

approval to the final version of the manuscript.

4

Notes

5

A.C. is co-founder of BiKi Technologies, a startup company that develops methods based

6

on molecular dynamics simulations and related approaches for investigating protein-

7

ligand (un)binding. R.E.A. has equity interest in, and is a cofounder and on the scientific

8

advisory board of, Actavalon, Inc.

9

ACKNOWLEDGMENT

10

We wish to thank Robert D. Malmstrom, Davide Branduardi, Martina Bertazzo,

11

Mariarosaria Ferraro, and Luca Bellucci for useful discussions and technical support.

12

Andrea Spitaleri is also gratefully acknowledged for providing the initial system topology.

13

14

REFERENCES

(1)

Dror, R. O.; Mildorf, T. J.; Hilger, D.; Manglik, A.; Borhani, D. W.; Arlow, D. H.;

ACS Paragon Plus Environment

35

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 64

1

Philippsen, A.; Villanueva, N.; Yang, Z.; Lerch, M. T.; Hubbell, W. L.; Kobilka, B. K.;

2

Sunahara, R. K.; Shaw, D. E. Structural Basis for Nucleotide Exchange in

3

Heterotrimeric G Proteins. Science. 2015, 348, 1361–1365.

4

(2)

Malmstrom, R. D.; Kornev, A. P.; Taylor, S. S.; Amaro, R. E. Allostery through the

5

Computational Microscope: CAMP Activation of a Canonical Signalling Domain.

6

Nat. Commun. 2015, 6, 7588.

7

(3)

Masetti, M.; Xie, H.; Krpetić, Ž.; Recanatini, M.; Alvarez-Puebla, R. A.; Guerrini, L.

8

Revealing DNA Interactions with Exogenous Agents by Surface-Enhanced Raman

9

Scattering. J. Am. Chem. Soc. 2015, 137, 469–476.

10

(4)

Bernetti, M.; Masetti, M.; Pietrucci, F.; Blackledge, M.; Jensen, M. R.; Recanatini,

11

M.; Mollica, L.; Cavalli, A. Structural and Kinetic Characterization of the Intrinsically

12

Disordered Protein SeV NTAIL through Enhanced Sampling Simulations. J. Phys.

13

Chem. B 2017, 121, 9572–9582.

14 15

(5)

Amaro, R. E.; Mulholland, A. J. Multiscale Methods in Drug Design Bridge Chemical and Biological Complexity in the Search for Cures. Nat. Rev. Chem. 2018, 2, 148.

ACS Paragon Plus Environment

36

Page 37 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Chemical Theory and Computation

(6)

Related Methods in Drug Discovery. J. Med. Chem. 2016, 59, 4035–4061.

2

3

De Vivo, M.; Masetti, M.; Bottegoni, G.; Cavalli, A. Role of Molecular Dynamics and

(7)

Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson,

4

S.; Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.;

5

Steinbrecher, T.; Beuming, T.; Damm, W.; Harder, E.; Sherman, W.; Brewer, M.;

6

Wester, R.; Murcko, M.; Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.;

7

Berne, B. J.; Friesner, R. A.; Abel, R. Accurate and Reliable Prediction of Relative

8

Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-

9

Energy Calculation Protocol and Force Field. J. Am. Chem. Soc. 2015, 137, 2695–

10

11

2703.

(8)

Salomon-Ferrer, R.; Götz, A. W.; Poole, D.; Le Grand, S.; Walker, R. C. Routine

12

Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit

13

Solvent Particle Mesh Ewald. J. Chem. Theory Comput. 2013, 9, 3878–3888.

14 15

(9)

Kutzner, C.; Páll, S.; Fechner, M.; Esztermann, A.; de Groot, B. L.; Grubmüller, H. Best Bang for Your Buck: GPU Nodes for GROMACS Biomolecular Simulations. J.

ACS Paragon Plus Environment

37

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Page 38 of 64

Comput. Chem. 2015, 36, 1990–2008.

2

(10) Harvey, M. J.; Giupponi, G.; Fabritiis, G. De. ACEMD: Accelerating Biomolecular

3

Dynamics in the Microsecond Time Scale. J. Chem. Theory Comput. 2009, 5,

4

1632–1639.

5

(11) Lee, T.-S.; Cerutti, D. S.; Mermelstein, D.; Lin, C.; LeGrand, S.; Giese, T. J.;

6

Roitberg, A.; Case, D. A.; Walker, R. C.; York, D. M. GPU-Accelerated Molecular

7

Dynamics and Free Energy Methods in Amber18: Performance Enhancements and

8

New Features. J. Chem. Inf. Model. 2018, 58, 2043–2050.

9

(12) Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood,

10

M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y.; Wriggers, W. Atomic-

11

Level Characterization of the Structural Dynamics of Proteins. Science. 2010, 330,

12

341–346.

13 14

(13) Shirts, M.; Pande, V. S. Screen Savers of the World Unite! Science. 2000, 290, 1903–1904.

ACS Paragon Plus Environment

38

Page 39 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Journal of Chemical Theory and Computation

(14) Gioia, D.; Bertazzo, M.; Recanatini, M.; Masetti, M.; Cavalli, A. Dynamic Docking: A Paradigm Shift in Computational Drug Discovery. Molecules . 2017, 22, 1−21..

3

(15) Votapka, L. W.; Amaro, R. E. Multiscale Estimation of Binding Kinetics Using

4

Brownian Dynamics, Molecular Dynamics and Milestoning. PLOS Comput. Biol.

5

2015, 11, e1004381.

6

(16) Votapka, L. W.; Jagger, B. R.; Heyneman, A. L.; Amaro, R. E. SEEKR: Simulation

7

Enabled Estimation of Kinetic Rates, A Computational Tool to Estimate Molecular

8

Kinetics and Its Application to Trypsin–Benzamidine Binding. J. Phys. Chem. B

9

2017, 121, 3597–3606.

10

(17) Jagger, B. R.; Lee, C. T.; Amaro, R. E. Quantitative Ranking of Ligand Binding

11

Kinetics with a Multiscale Milestoning Simulation Approach. J. Phys. Chem. Lett.

12

2018, 9, 4941–4948.

13

(18) Teo, I.; Mayne, C. G.; Schulten, K.; Lelièvre, T. Adaptive Multilevel Splitting Method

14

for Molecular Dynamics Calculation of Benzamidine-Trypsin Dissociation Time. J.

15

Chem. Theory Comput. 2016, 12, 2983–2989.

ACS Paragon Plus Environment

39

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 64

1

(19) Lotz, S. D.; Dickson, A. Unbiased Molecular Dynamics of 11 Min Timescale Drug

2

Unbinding Reveals Transition State Stabilizing Interactions. J. Am. Chem. Soc.

3

2018, 140, 618–628.

4 5

6 7

8 9

10 11

12 13

14

(20) Dickson, A.; Lotz, S. D. Multiple Ligand Unbinding Pathways and Ligand-Induced Destabilization Revealed by WExplore. Biophys. J. 2017, 112, 620–629.

(21) Dickson, A. Mapping the Ligand Binding Landscape. Biophys. J. 2018, 115, 1707– 1719.

(22) Bernetti, M.; Masetti, M.; Rocchia, W.; Cavalli, A. Kinetics of Drug Binding and Residence Time. Annu. Rev. Phys. Chem. 2019, 70, 143–71.

(23) Borhani, D. W.; Shaw, D. E. The Future of Molecular Dynamics Simulations in Drug Discovery. J. Comput. Aided. Mol. Des. 2012, 26, 15–26.

(24) Betz, R. M.; Dror, R. O. How Effectively Can Adaptive Sampling Methods Capture Spontaneous Ligand Binding? J. Chem. Theory Comput. 2019, 15, 2053–2063.

(25) Laio, A.; Parrinello, M. Escaping Free-Energy Minima. Proc. Natl. Acad. Sci. 2002,

ACS Paragon Plus Environment

40

Page 41 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Chemical Theory and Computation

99, 12562–12566.

2

(26) Abrams, C.; Bussi, G. Enhanced Sampling in Molecular Dynamics Using

3

Metadynamics, Replica-Exchange, and Temperature-Acceleration. Entropy . 2014,

4

16, 163–199.

5

(27) Barducci, A.; Bussi, G.; Parrinello, M. Well-Tempered Metadynamics: A Smoothly

6

Converging and Tunable Free-Energy Method. Phys. Rev. Lett. 2008, 100, 020603.

7

(28) Branduardi, D.; Gervasio, F. L.; Parrinello, M. From A to B in Free Energy Space.

8

J. Chem. Phys. 2007, 126, 054103

9

(29) Berteotti, A.; Cavalli, A.; Branduardi, D.; Gervasio, F. L.; Recanatini, M.; Parrinello,

10

M. Protein Conformational Transitions: The Closure Mechanism of a Kinase

11

Explored by Atomistic Simulations. J. Am. Chem. Soc. 2009, 131, 244–250.

12

(30) Ceccarini, L.; Masetti, M.; Cavalli, A.; Recanatini, M. Ion Conduction through the

13

14

HERG Potassium Channel. PLoS One 2012, 7, e49017.

(31) Favia, A. D.; Masetti, M.; Recanatini, M.; Cavalli, A. Substrate Binding Process and

ACS Paragon Plus Environment

41

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 64

1

Mechanistic Functioning of Type 1 11β-Hydroxysteroid Dehydrogenase from

2

Enhanced Sampling Methods. PLoS One 2011, 6, e25375.

3

(32) Saladino, G.; Gauthier, L.; Bianciotto, M.; Gervasio, F. L. Assessing the

4

Performance of Metadynamics and Path Variables in Predicting the Binding Free

5

Energies of P38 Inhibitors. J. Chem. Theory Comput. 2012, 8, 1165–1170.

6

(33) Colizzi, F.; Masetti, M.; Recanatini, M.; Cavalli, A. Atomic-Level Characterization of

7

the Chain-Flipping Mechanism in Fatty-Acids Biosynthesis. J. Phys. Chem. Lett.

8

2016, 7, 2899–2904.

9 10

(34) Zinovjev, K.; Tuñón, I. Reaction Coordinates and Transition States in Enzymatic Catalysis. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2017, 8, e1329.

11

(35) Shan, Y.; Kim, E. T.; Eastwood, M. P.; Dror, R. O.; Seeliger, M. A.; Shaw, D. E.

12

How Does a Drug Molecule Find Its Target Binding Site? J. Am. Chem. Soc. 2011,

13

133, 9181–9183.

14

(36) Buch, I.; Giorgino, T.; De Fabritiis, G. Complete Reconstruction of an Enzyme-

ACS Paragon Plus Environment

42

Page 43 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Inhibitor Binding Process by Molecular Dynamics Simulations. Proc. Natl. Acad.

2

Sci. 2011, 108, 10184–10189.

3

(37) Dror, R. O.; Pan, A. C.; Arlow, D. H.; Borhani, D. W.; Maragakis, P.; Shan, Y.; Xu,

4

H.; Shaw, D. E. Pathway and Mechanism of Drug Binding to G-Protein-Coupled

5

Receptors. Proc. Natl. Acad. Sci. 2011, 108, 13118–13123.

6 7

(38) Husic, B. E.; Pande, V. S. Markov State Models: From an Art to a Science. J. Am.

Chem. Soc. 2018, 140, 2386–2396.

8

(39) Doerr, S.; De Fabritiis, G. On-the-Fly Learning and Sampling of Ligand Binding by

9

High-Throughput Molecular Simulations. J. Chem. Theory Comput. 2014, 10,

10

2064–2069.

11

(40) Held, M.; Metzner, P.; Prinz, J.-H.; Noé, F. Mechanisms of Protein-Ligand

12

Association and Its Modulation by Protein Mutations. Biophys. J. 2011, 100, 701–

13

710.

14

(41) Pan, A. C.; Roux, B. Building Markov State Models along Pathways to Determine

ACS Paragon Plus Environment

43

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 64

1

Free Energies and Rates of Transitions. J. Chem. Phys. 2008, 129, 64107.

2

(42) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot,

3

C.; Skeel, R. D.; Kalé, L.; Schulten, K. Scalable Molecular Dynamics with NAMD.

4

J. Comput. Chem. 2005, 26, 1781–1802.

5

(43) Shukla, D.; Meng, Y.; Roux, B.; Pande, V. S. Activation Pathway of Src Kinase

6

Reveals Intermediate States as Targets for Drug Design. Nat. Commun. 2014, 5,

7

3397.

8 9

(44) Pande, V. S.; Beauchamp, K.; Bowman, G. R. Everything You Wanted to Know about Markov State Models but Were Afraid to Ask. Methods 2010, 52, 99–105.

10

(45) Scherer, M. K.; Trendelkamp-Schroer, B.; Paul, F.; Pérez-Hernández, G.;

11

Hoffmann, M.; Plattner, N.; Wehmeyer, C.; Prinz, J.-H.; Noé, F. PyEMMA 2: A

12

Software Package for Estimation, Validation, and Analysis of Markov Models. J.

13

Chem. Theory Comput. 2015, 11, 5525–5542.

14

(46) E, W.; Vanden-Eijnden, E. Transition-Path Theory and Path-Finding Algorithms for

ACS Paragon Plus Environment

44

Page 45 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Chemical Theory and Computation

the Study of Rare Events. Annu. Rev. Phys. Chem. 2010, 61, 391–420.

2

(47) Noé, F.; Schütte, C.; Vanden-Eijnden, E.; Reich, L.; Weikl, T. R. Constructing the

3

Equilibrium Ensemble of Folding Pathways from Short Off-Equilibrium Simulations.

4

Proc. Natl. Acad. Sci. 2009, 106, 19011–19016.

5 6

7 8

(48) Meng, Y.; Shukla, D.; Pande, V. S.; Roux, B. Transition Path Theory Analysis of CSrc Kinase Activation. Proc. Natl. Acad. Sci. 2016, 113, 9193–9198.

(49) General, I. J. A Note on the Standard State’s Binding Free Energy. J. Chem. Theory

Comput. 2010, 6, 2520–2524.

9

(50) Wacker, D.; Fenalti, G.; Brown, M. A.; Katritch, V.; Abagyan, R.; Cherezov, V.;

10

Stevens, R. C. Conserved Binding Mode of Human Β2 Adrenergic Receptor Inverse

11

Agonists and Antagonist Revealed by X-Ray Crystallography. J. Am. Chem. Soc.

12

2010, 132, 11443–11445.

13

(51) Marcos-Alcalde, I.; Setoain, J.; Mendieta-Moreno, J. I.; Mendieta, J.; Gómez-

14

Puertas, P. MEPSA: Minimum Energy Pathway Analysis for Energy Landscapes.

ACS Paragon Plus Environment

45

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

2

Page 46 of 64

Bioinformatics 2015, 31, 3853–3855.

(52) Dickson, A.; Brooks

3rd, C. L. WExplore: Hierarchical Exploration of High-

3

Dimensional Spaces Using the Weighted Ensemble Algorithm. J. Phys. Chem. B

4

2014, 118, 3532–3542.

5 6

(53) Huber, G. A.; Kim, S. Weighted-Ensemble Brownian Dynamics Simulations for Protein Association Reactions. Biophys. J. 1996, 70, 97–110.

7

(54) Mollica, L.; Decherchi, S.; Zia, S. R.; Gaspari, R.; Cavalli, A.; Rocchia, W. Kinetics

8

of Protein-Ligand Unbinding via Smoothed Potential Molecular Dynamics

9

Simulations. Sci. Rep. 2015, 5, 11539.

10

(55) Kokh, D.; Amaral, M.; Bomke, J.; Grädler, U.; Musil, D.; Buchstaller, H.-P.; Dreyer,

11

M.; Frech, M.; Lowinski, M.; Vallee, F.; Bianciotto, M.; Rak, A.; C Wade, R.

12

Estimation of Drug-Target Residence Times by τ-Random Acceleration Molecular

13

Dynamics Simulations. J. Chem. Theory Comput. 2018, 14, 3859−3869.

14

(56) Bernetti, M.; Rosini, E.; Mollica, L.; Masetti, M.; Pollegioni, L.; Recanatini, M.;

ACS Paragon Plus Environment

46

Page 47 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Cavalli, A. Binding Residence Time through Scaled Molecular Dynamics: A

2

Prospective Application to HDAAO Inhibitors. J. Chem. Inf. Model. 2018, 58, 2255–

3

2265.

4

(57) Schuetz, D. A.; Bernetti, M.; Bertazzo, M.; Musil, D.; Eggenweiler, H.-M.;

5

Recanatini, M.; Masetti, M.; Ecker, G. F.; Cavalli, A. Predicting Residence Time and

6

Drug Unbinding Pathway through Scaled Molecular Dynamics. J. Chem. Inf. Model.

7

2019, 59, 535–549.

8

(58) Sabbadin, D.; Moro, S. Supervised Molecular Dynamics (SuMD) as a Helpful Tool

9

To Depict GPCR-Ligand Recognition Pathway in a Nanosecond Time Scale. J.

10

Chem. Inf. Model. 2014, 54, 372−376.

11

(59) Cuzzolin, A.; Sturlese, M.; Deganutti, G.; Salmaso, V.; Sabbadin, D.; Ciancetta, A.;

12

Moro, S. Deciphering the Complexity of Ligand-Protein Recognition Pathways

13

Using Supervised Molecular Dynamics (SuMD) Simulations. J. Chem. Inf. Model.

14

2016, 56, 4687-4705

15

(60) Spitaleri, A.; Decherchi, S.; Cavalli, A.; Rocchia, W. Fast Dynamic Docking Guided

ACS Paragon Plus Environment

47

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 64

1

by Adaptive Electrostatic Bias: The MD-Binding Approach. J. Chem. Theory

2

Comput. 2018, 143, 1727-1736.

3

(61) Bertazzo, M.; Bernetti, M.; Recanatini, M.; Masetti, M.; Cavalli, A. Fully Flexible

4

Docking via Reaction-Coordinate-Independent Molecular Dynamics Simulations. J.

5

Chem. Inf. Model. 2018, 58, 490–500.

6

(62) Linker, S. M.; Magarkar, A.; Koefinger, J.; Hummer, G.; Seeliger, D. Fragment

7

Binding Pose Predictions Using Unbiased Simulations and Markov-State-Models.

8

J. Chem. Theory Comput. 2019.

9

(63) Robustelli, P.; Piana, S.; Shaw, D. E. Developing a Molecular Dynamics Force Field

10

for Both Folded and Disordered Protein States. Proc. Natl. Acad. Sci. 2018, 115,

11

4758–4766.

12

(64) Haldar, S.; Comitani, F.; Saladino, G.; Woods, C.; van der Kamp, M. W.;

13

Mulholland, A. J.; Gervasio, F. L. A Multiscale Simulation Approach to Modeling

14

Drug–Protein Binding Kinetics. J. Chem. Theory Comput. 2018, 14 , 6093–6101.

ACS Paragon Plus Environment

48

Page 49 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Journal of Chemical Theory and Computation

(65) Bowman, G. R. An overview and practical guide to buildingMarkov state models.

Adv. Exp. Med. Biol. 2014, 797, 7−22.

3

(66) Prinz, J.-H.; Wu, H.; Sarich, M.; Keller, B.; Senne, M.; Held, M.; Chodera, J. D.;

4

Schütte, C.; Noé, F. Markov Models of Molecular Kinetics: Generation and

5

Validation. J. Chem. Phys. 2011, 134, 174105.

6

(67) Plattner, N.; Noé, F. Protein Conformational Plasticity and Complex Ligand-Binding

7

Kinetics Explored by Atomistic Simulations and Markov Models. Nat. Commun.

8

2015, 6, 7653.

9

(68) MacQueen, J. Some Methods for Classification and Analysis of Multivariate

10

Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical

11

Statistics and Probability, Volume 1: Statistics; Le Cam, L. M. and Neyman, J., Eds.;

12

University of California Press: Berkeley, Calif., 1967; pp 281–297.

13

(69) Plattner, N.; Doerr, S.; De Fabritiis, G.; Noé, F. Complete Protein–Protein

14

Association Kinetics in Atomic Detail Revealed by Molecular Dynamics Simulations

15

and Markov Modelling. Nat. Chem. 2017, 9, 1005.

ACS Paragon Plus Environment

49

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 50 of 64

1

(70) Pinamonti, G.; Zhao, J.; Condon, D. E.; Paul, F.; Noè, F.; Turner, D. H.; Bussi, G.

2

Predicting the Kinetics of RNA Oligonucleotides Using Markov State Models. J.

3

Chem. Theory Comput. 2017, 13, 926–934.

4

(71) Pérez-Hernández, G.; Paul, F.; Giorgino, T.; De Fabritiis, G.; Noé, F. Identification

5

of Slow Molecular Order Parameters for Markov Model Construction. J. Chem.

6

Phys. 2013, 139, 15102.

7 8

(72) Chodera, J. D.; Noé, F. Markov State Models of Biomolecular Conformational Dynamics. Curr. Opin. Struct. Biol. 2014, 25, 135–144.

9

(73) Da, L.-T.; Sheong, F. K.; Silva, D.-A.; Huang, X. Application of Markov State Models

10

to Simulate Long Timescale Dynamics of Biological Macromolecules BT - Protein

11

Conformational Dynamics; Han, K., Zhang, X., Yang, M., Eds.; Springer

12

International Publishing: Cham, 2014; pp 29–66.

13

(74) Ellson, J.; Gansner, E.; Koutsofios, L.; North, S. C.; Woodhull, G.; Mutzel, P.;

14

Junger, M.; Leipert, S. Graphviz – open source graph drawing tools. Lect. Notes

15

Comput. Sci. 2002, 2265, 483–484.

ACS Paragon Plus Environment

50

Page 51 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1 2

(75) Grubmüller, H.; Heymann, B.; Tavan, P. Ligand Binding: Molecular Mechanics

3

Calculation of the Streptavidin-Biotin Rupture Force. Science. 1996, 271, 997–999.

4

(76) Izrailev, S.; Stepaniants, S.; Isralewitz, B.; Kosztin, D.; Lu, H.; Molnar, F.; Wriggers,

5

W.; Schulten, K. Computational Molecular Dynamics: Challenges, Methods, Ideas.

6

In Proceedings of the 2nd International Symposium on Algorithms for

7

Macromolecular Modelling, first edition; Deuflhard, P., Hermans, J., Leimkuhler, B.,

8

Mark, A. E., Reich, S., Skeel, R. D., Eds.; Springer Berlin Heidelberg: Berlin,

9

Heidelberg, 1997; pp 39–65.

10

(77) Bonomi, M.; Bussi, G.; Camilloni, C.; Tribello G. A.; Banáš P.; Barducci, A.; Bernetti,

11

M.; Bolhuis, P. G.; Bottaro, S.; Branduardi, D.; Capelli, R.; Carloni, P.; Ceriotti, M.;

12

Cesari, A.; Chen, H.; Chen, W.; Colizzi, F.; De, S.; De La Pierre, M.; Donadio, D.;

13

Drobot, V.; Ensing, B.; Ferguson, A. L.; Filizola, M.; Fraser, J. S.; Fu, H.;

14

Gasparotto, P.; Gervasio, F. L.; Giberti, F.; Gil-Ley, A.; Giorgino, T.; Heller, G. T.;

15

Hocky, G. M.; Iannuzzi, M.; Invernizzi, M.; Jelfs, K. E.; Jussupow, A.; Kirilin, E.;

ACS Paragon Plus Environment

51

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 52 of 64

1

Laio, A.; Limongelli, V.; Lindorff-Larsen, K.; Löhr, T.; Marinelli, F.; Martin-Samos,

2

L.; Masetti, M.; Meyer, R.; Michaelides, A.; Molteni, C.; Morishita, T.; Nava, M.;

3

Paissoni, C.; Papaleo, E.; Parrinello, M.; Pfaendtner, J.; Piaggi, P.; Piccini, G.M.;

4

Pietropaolo, A.; Pietrucci, F.; Pipolo, S.; Provasi, D.; Quigley, D.; Raiteri, P.;

5

Raniolo, S.; Rydzewski, J.; Salvalaglio, M.; Sosso, G. C.; Spiwok, V.; Šponer, J.;

6

Swenson, D. W. H.; Tiwary, P.; Valsson, O.; Vendruscolo, M.; Voth, G. A.; White,

7

A. . Promoting Transparency and Reproducibility in Enhanced Molecular

8

Simulations. Nat. Methods 2019, 16, 670–673.

9

(78) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.;

10

Simmerling, C. Ff14SB: Improving the Accuracy of Protein Side Chain and

11

Backbone Parameters from Ff99SB. J. Chem. Theory Comput. 2015, 11, 3696–

12

3713.

13

(79) Dickson, C. J.; Madej, B. D.; Skjevik, Å. A.; Betz, R. M.; Teigen, K.; Gould, I. R.;

14

Walker, R. C. Lipid14: The Amber Lipid Force Field. J. Chem. Theory Comput.

15

2014, 10, 865–879.

ACS Paragon Plus Environment

52

Page 53 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

(80) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development

2

and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–

3

1174.

4

(81) Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A Well-Behaved Electrostatic

5

Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The

6

RESP Model. J. Phys. Chem. 1993, 97, 10269–10280.

7

(82) Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H.

8

J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A. NWChem: A

9

Comprehensive and Scalable Open-Source Solution for Large Scale Molecular

10

Simulations. Comput. Phys. Commun. 2010, 181, 1477–1489.

11

(83) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L.

12

Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem.

13

Phys. 1983, 79, 926–935.

14

(84) Joung, I. S.; Cheatham, T. E. Determination of Alkali and Halide Monovalent Ion

15

Parameters for Use in Explicitly Solvated Biomolecular Simulations. J. Phys. Chem.

ACS Paragon Plus Environment

53

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Page 54 of 64

B 2008, 112, 9020–9041.

2

(85) Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M.

3

R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS

4

4.5: A High-Throughput and Highly Parallel Open Source Molecular Simulation

5

Toolkit. Bioinforma. 2013, 29, 845–854.

6

(86) Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. PLUMED 2:

7

New Feathers for an Old Bird. Comput. Phys. Commun. 2014, 185, 604–613.

8

(87) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A

9

Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577–8593.

10

(88) Nosé, S. A Molecular Dynamics Method for Simulations in the Canonical Ensemble.

11

12 13

14

Mol. Phys. 1984, 52, 255–268.

(89) Hoover, W. G. Canonical Dynamics: Equilibrium Phase-Space Distributions. Phys.

Rev. A 1985, 31, 1695–1697.

(90) Parrinello, M.; Rahman, A. Polymorphic Transitions in Single Crystals: A New

ACS Paragon Plus Environment

54

Page 55 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

2 3

Journal of Chemical Theory and Computation

Molecular Dynamics Method. J. Appl. Phys. 1981, 52, 7182–7190.

(91) Nosé, S.; Klein, M. L. Constant Pressure Molecular Dynamics for Molecular Systems. Mol. Phys. 1983, 50, 1055–1076.

4

(92) Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. LINCS: A Linear

5

Constraint Solver for Molecular Simulations. J. Comput. Chem. 1998, 18, 1463–

6

1472.

7

(93) Grossfield, A.; Zuckerman, D. M. Ann. Rep. Comp. Chem. 2009, 5, 23–48. (94)

8

Branduardi, D.; Bussi, G.; Parrinello, M. Metadynamics with Adaptive Gaussians.

9

J. Chem. Theory Comput. 2012, 8, 2247–2254.

10

ACS Paragon Plus Environment

55

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Page 56 of 64

TOC Graphics

2

ACS Paragon Plus Environment

56

Page 57 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 1. The MSM-Path Metadynamics approach described in this work. 171x52mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Orthogonal views of the the β2-AR embedded in a lipid bilayer. The protein is shown in cartoons color-coded according to the α-helix regions. In the top panel, the Alprenolol binding is depicted tracing its center-of-mass (orange spheres) along the whole set of reactive trajectories.

ACS Paragon Plus Environment

Page 58 of 64

Page 59 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 3. Network of fluxes for the Alprenolol binding to the β2-AR (A) Network of fluxes calculated through TPT. Each node represents a single microstate and its size is proportional to the connections. Conversely, the size of the edges is proportional to the flux. (B) A close-up view of the most relevant regions of the network, showing the four intermediate states mostly involved in the transition from the EC to bound state (endpoints A and B, respectively).

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Retrospective validation of the MSM-derived path though projection of the conventional-MD trajectories on the PCVs space. Each color of the grayscale, from black to light grey (labels A0-C9 in the upper left corner of the plot), identifies one of the 11 different trajectories used to build the MSM. The larger colored dots, labelled A, I1-4, and B, are the projection of the 6 states used as templates to construct the frameset. These are consistent with the labels used in Figure 2. The corresponding configurations are depicted right below the plot, labeled using the same color code. Conversely, right above the plot, sample structures are shown for dense regions in the PCVs space depicting either alternative binding pathways or non-productive trajectories.

ACS Paragon Plus Environment

Page 60 of 64

Page 61 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 5. Mechanistic interpretation of the drug binding process. (A) Free energy surface for the Alprenolol binding to the β2-AR as reconstructed by metadynamics along the guess path. Iso-lines are plotted each 2 kcal/mol. The main metastable states are reported together with selected configurations. The minimum free energy pathway is displayed in red (solid line) along with an alternative route (dashed line). (B) Free energy profile along the MFEP (black line) together with the associated statistical uncertainty (gray bars). (C) Focus on the Alprenolol rearrangement upon the M3 to M6 transition (red to blue color scale, nitrogen atom highlighted in yellow).

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Relevant states along the Minimum Free Energy Path. (A) One representative configuration per relevant minima. The color gradient goes from dark red (EC formed) to blue (crystal pose). (B) Representative configuration for the M1 state. (C) Representative configuration for the M3 state. (D) Halfturn rotation along the ligand main axis to move from M3 (darker pink) to M4 (lighter pink). (E) Representative configuration for the M5 and M6 states. (F) Representative configuration for the M7 state. (G) The 4 Å displacement to go from M8 (white) to M9 (azure). (H) Representative configuration for the M11 state. In panels (B) to (H), ligand color code is consistent with panel (A); ligand polar atoms are highlighted as spheres (blue for nitrogen, red for oxygen) and polar hydrogens are also shown.

ACS Paragon Plus Environment

Page 62 of 64

Page 63 of 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 7. Assessing the response of residues Phe193 and Tyr308 to alprenolol passage through the narrow crevice. (A) Phenyl-phenyl side-chains’ distance between Phe193 and Tyr308 plotted against simulation time (red line, left y-axis). The instantaneous value of S during metadynamics simulation is also shown (black line, right y-axis). (B) Scatter plot of S plotted against the phenyl-phenyl side-chains’ distance. (C) Representative configurations extracted from the metadynamics simulation showing the phenyl-phenyl gate in the closed, open, and wide-open states (left, center, and right, respectively).

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8. The alternative binding path. Configurations sampled in the alternative path, namely M10’ (A) and M12’ (B). Alprenolol is colored in yellow.

ACS Paragon Plus Environment

Page 64 of 64