Efficient Round-Trip Time Optimization for Replica-Exchange

May 16, 2017 - Replica-exchange enveloping distribution sampling (RE-EDS) allows the efficient estimation of free-energy differences between multiple ...
0 downloads 4 Views 1MB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

Efficient Round-Trip Time Optimisation for ReplicaExchange Enveloping Distribution Sampling (RE-EDS) Dominik Sidler, Michael Cristofol Clough, and Sereina Riniker J. Chem. Theory Comput., Just Accepted Manuscript • Publication Date (Web): 16 May 2017 Downloaded from http://pubs.acs.org on May 21, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Efficient Round-Trip Time Optimisation for Replica-Exchange Enveloping Distribution Sampling (RE-EDS) Dominik Sidler, Michael Cristòfol-Clough, and Sereina Riniker∗ Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland E-mail: [email protected] Abstract Replica-exchange enveloping distribution sampling (RE-EDS) allows the efficient estimation of free-energy differences between multiple end-states from a single molecular dynamics (MD) simulation. In EDS, a reference state is sampled, which can be tuned by two kinds of parameters, i.e. smoothness parameters(s) and energy offsets, such that all end-states are sufficiently sampled. The choice of these parameters is, however, not trivial. Replica exchange (RE) or parallel tempering is a widely applied technique to enhance sampling. By combining EDS with the RE technique, the parameter choice problem could be simplified and the challenge shifted towards an optimal distribution of the replicas in the smoothness-parameter space. The choice of a certain replica distribution can alter the sampling efficiency significantly. In this work, global roundtrip time optimisation (GRTO) algorithms are tested for the use in RE-EDS simulations. In addition, a local round-trip time optimisation (LRTO) algorithm is proposed for systems with slowly adapting environments, where a reliable estimate for the roundtrip time is challenging to obtain. The optimisation algorithms were applied to REEDS simulations of a system of nine small-molecule inhibitors of phenylethanolamine

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

N -methyltransferase (PNMT). The energy offsets were determined using the recently proposed parallel energy-offset (PEOE) estimation scheme. While multi-state GRTO algorithm yielded the best replica distribution for the ligands in water, the multi-state LRTO algorithm was found to be the method of choice for the ligands in complex with PNMT. With this, the 36 alchemical free-energy differences between the nine ligands were calculated successfully from a single RE-EDS simulation of 10 ns length. RE-EDS presents thus an efficient method for the estimation of relative binding free energies.

1

Introduction

Methods involving molecular dynamics (MD) or Monte Carlo (MC) simulations belong to the most accurate but computationally expensive techniques to estimate free-energy differences. 1,2 Most of the popular techniques such as thermodynamic integration (TI) 3 or free energy perturbation (FEP) 4 are restricted to problems with two end-states A and B. Thus, if for example the relative binding free energies between a series of ligands are to be estimated, a separate calculation must be performed for each pair of ligands. To reduce the resulting combinatorial explosion, schemes such as the lead optimisation mapper 5 were developed. A more elegant approach to estimate free-energy differences between multiple end-states is offered by enveloping distribution sampling (EDS), 6–10 where different end-states are combined into one reference-state Hamiltonian HR . In principle this allows the sampling of multiple (N ) end-states in a single simulation, which is potentially more efficient than any pairwise approach. The EDS reference state can be tuned by two kinds of parameters to ensure sufficient sampling of all end-states. 8 The smoothness parameter s flattens the complex landscape of VR . The energy offsets EiR allow the modification of the contribution of each end-state i to VR individually. The challenge in EDS is thus the optimal choice of these reference-state parameters. Different algorithms to determine the parameters in an automated manner were proposed in the past. 9,11 However, the first scheme showed an artificial reciprocal relationship between the smoothness parameter and the energy offsets, whereas 2

ACS Paragon Plus Environment

Page 2 of 33

Page 3 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the second scheme is limited to two end-states only. 11 Successful applications for a diverse range of systems have been reported in the past for pairwise EDS using this scheme. 11–13 In order to go beyond two end-states, EDS can be combined with Hamiltonian replica exchange (H-RE) 14–16 into replica-exchange EDS (RE-EDS). The approach was originally developed for constant-pH MD simulations, 17,18 and recently generalized to arbitrary systems. 19 Using RE-EDS, the problem of choosing optimal smoothing parameter(s) can be separated from the problem of choosing optimal energy offsets in practice. 19 Thus, a parallel energy-offset estimation (PEOE) scheme could be developed to estimate the energy offsets for multiple states from a short RE-EDS simulation using default parameters. 19 If the barriers between the end-states are sufficiently high, a region with reduced exchange acceptance probability between neighboring replicas (termed “gap region”) was observed in RE-EDS simulations. 19 The extent and magnitude of the gap region depends both on the height of the barriers and the number of end-states in the system. Therefore, a scheme to optimally distribute the replicas in s-space is needed, especially for systems with a slowly adapting environment such as protein binding pockets. In this study, we test algorithms to optimize the replica distribution by minimizing globally the round-trip time, which were developed in the past for temperature replica-exchange MD (T-REMD) simulations, 20–23 for the use in RE-EDS simulations. In addition, a local round-trip optimisation algorithm is introduced and compared to the global schemes. Unlike the latter, the local algorithm does not impose a constant average acceptance probability, which possibly makes it better suited for systems with pronounced regions of low exchange acceptance due to a slowly adapting environment. The algorithms are applied to a set of nine small-molecule inhibitors of phenylethanolamine N -methyltransferase (PNMT), for which the relative binding free energies between pairs of ligands are calculated using, complex − ∆Gfree ∆∆Gbind BA BA = ∆GBA

3

ACS Paragon Plus Environment

(1)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 33

is the free-energy difference between two ligands A and B in complex with where ∆Gcomplex BA bind the protein, and ∆Gfree BA between them in solution. ∆∆GBA can be directly compared to

the difference between the binding free energy of the two ligands A and B as obtained from experiment, bind ∆∆Gbind − ∆Gbind BA = ∆GB A .

(2)

The system with PNMT and its inhibitors can be considered as a proof of concept for RE-EDS as it involves ligands with diverse functional groups leading to a gap region with pronounced round-trip bottlenecks in the RE-EDS simulation. By combining the PEOE scheme with the round-trip optimisation algorithms, 36 relative binding free energies can be calculated from two simulations of 10 ns length. RE-EDS presents thus a computationally efficient alternative to TI or FEP methods.

2 2.1

Theory Replica-Exchange Enveloping Distribution Sampling (RE-EDS)

Enveloping distribution sampling (EDS) is based on a reference state potential energy function VR , which combines the Boltzmann sum of N end-states as follows, 6,8,24 ~ R ) = −(βs)−1 ln VR (r, s, E

(

N X

e

−βs(Vi (r)−EiR )

i=1

)

,

(3)

where β = 1/(kB T ) with kB being the Boltzmann constant and T the temperature, Vi (r) is the potential energy of end-state i, s is the smoothness parameter and EiR are the energy R ~ R , s}. offsets. In this form, the reference state has N + 1 parameters, {E1R , ..., EN , s} = {E

Schematically, the effect of the energy offset EiR can be interpreted as a change of the depth of the local minima that corresponds to end-state i in the reference-state potential energy surface. In addition, lowering the s-parameter leads to a general smoothing of the referencestate potential energy surface, thus reducing barriers between different end-states. 11,19 It is 4

ACS Paragon Plus Environment

Page 5 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

important to note that the effects of EiR and s on the shape of the reference-state potential energy surface cannot be decoupled mathematically. The free-energy difference between any end-state pair ji enveloped in VR can be calculated by using ensemble reweighting,

∆Gji = −β

Thus,

N 2



−1

e−β(Vj −VR ) R ln −β(V −V ) . i R e R

(4)

free-energy differences can be calculated from a single simulation of R. In

order to obtain sufficient transitions between multiple end-states at physically realistic svalues (e.g. s = 1), the combination of EDS and replica exchange (RE) 14–16,25 (acting on the smoothness parameter s) was proposed in the context of constant-pH MD simulations 17,18 and recently generalized to arbitrary systems. 19 For this, a robust approach to estimate the energy offsets was developed. 19 The parallel energy-offset estimation (PEOE) scheme is based on the observation that the choice of the energy offsets and of the smoothness parameter can be decoupled in practice. As shown in Ref. 19, the PEOE scheme automatically identifies an “undersampling” region based on data from a short initial RE-EDS simulation with default parameters. Using the data obtained from the undersampling region, it is possible to estimate accurate energy offsets for all end-states in parallel without additional computational effort.

2.2

Round-trip time optimisation algorithm

Depending on the distribution of the replicas in the parameter space, the resulting Metropolis acceptance probability between a pair of replicas can vary substantially. This can for example be the case for RE-EDS when the s-values are distributed equally on a logarithmic scale. 19 Katzgraber et al. 21 showed for temperature-RE MD simulations (T-REMD) that the local optimisation of the transition probabilities between replicas does not necessarily lead to an optimal distribution of the replicas. Instead, the flow between the extremal replicas should be optimised, leading to non-constant acceptance rates in the vicinity of phase tran-

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 33

sitions. Based on an initial replica parameter distribution, they proposed an iterative replica distribution algorithm, which maximises the flow globally, for a fixed number of replicas M . In the following the optimisation scheme is adapted for RE-EDS. 2.2.1

One-dimensional global round-trip time optimisation (1-GRTO)

The basic principle of the round-trip time optimisation algorithm is that the distribution of the replicas is optimised such that the expected time τ to cross the whole parameter set S (i.e. ordered set of s-values) for a given number of replicas M is minimised. To this end, Katzgraber et al. 21 calculated the fraction of replicas, f1 , which have visited one of the two extremal replicas most recently for every replica i. In RE-EDS, f1 is defined as,

f1 (si ) :=

ndown (si ) , nup (si ) + ndown (si )

(5)

where nup is the number of visits coming from the lower bound s1 = min(s ∈ S) and ndown is the number of visits coming from the upper bound sM = max(s ∈ S) (top panel in Fig. 1). As shown by Nadler et al. 23 in a more general way, the fraction f gives rise to a stationary distribution of the probability flow between the replicas obeying boundary conditions f (sM ) = 1 and f (s1 ) = 0, leading to the definition of a discrete stationary current J between two neighbouring replicas as, !

J = f (si+1 )WS (si+1 → si ) − f (si )WS (si → si+1 ) = const

(6)

with a transition probability given by,



WS (si+1 → si ) = WS (si → si+1 ) = pacc

(7)

due to the symmetry of the RE process and the Metropolis acceptance criterion. 26 Solving the difference equation with the given boundary conditions leads to the following expression

6

ACS Paragon Plus Environment

Page 7 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

for J and the expected round-trip time τ , respectively, 23

J = τ −1 =

" N −1 X i=0

1 Ws (si → si+1 )

#−1

.

(8)

It can be shown analytically using Lagrange multipliers 23 that a minimal round-trip time τ under the constraint that the average acceptance probability is constant, i.e., M X

!

WS (si ) = const

(9)

i=1

leads to the following expressions for the acceptance probabilities and flow distribution among the replicas, WSopt = const, f opt (si ) =

i−1 . M −1

(10)

These expressions are used for the one-dimensional global round-trip time optimisation scheme (1-GRTO) as described in the literature. 20–23 2.2.2

Multi-state global round-trip time optimisation (N-GRTO)

The 1-GRTO algorithm described above assumes ergodicity, i.e. the acceptance probability between neighbouring replicas is independent of the current system state. However, ergodicity breaking may occur spontaneously in a general system as described by Nadler et al. 23 In such a case, different acceptance rates may be observed between neighbouring replicas depending on the current system state. In (RE-)EDS, the ergodicity is broken by construction of the EDS reference state R given in Eq. (3). Thus, the potential-energy landscape splits into multiple distinct minima, which correspond to the end-states, for higher s-values, e.g. s → 1 (bottom panel in Fig. 1). In contrary to spontaneous ergodicity breaking, a specific end-state j can be assigned to R at every time point t for high s-values in EDS, as shown for s → ∞ in Ref. 19. Therefore, it is possible to consider each end-state j separately in the GRTO algorithm and assign an

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 8 of 33

Page 9 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

individual weight wj to it with

PN

j=1

wj = 1,

fN (si ) :=

N X j=1

wj njdown (si )N , nup (si ) + N njdown (si )

(11)

obeying the boundary conditions fN (sM ) = 1, fN (s1 ) = 0. The term njdown (si ) corresponds to the number of visits coming from the upper bound with the lowest potential energy among the end-states (Vj < Vk , ∀k ∈ [N ] \ j). Note that the ergodicity is not broken in the undersampling region (bottom panel in Fig. 1). Thus, no end-state j can be assigned to nup (si ) for small s-values. We decided to assign an end-state j in a random manner, resulting in a factor N in Eq. (11). There may occur additional spontaneous ergodicity breaking in (RE-)EDS simulation depending on the complexity of the potential energy landscape of each individual end-state, but this situation is not considered in the multi-state GRTO (N-GRTO) algorithm described above. 2.2.3

Multi-state local round-trip time optimisation (N-LRTO)

The GRTO algorithm attempts to find the globally optimal replica distribution for a fixed number of replicas. In practice, however, many of the underlying assumptions can be violated substantially. First, there is no guarantee that any chosen fraction f is a good representation of the highly complex potential landscape in a generalised ensemble. 23 For example, the optimisation of a one-dimensional measure f could lead to a s-distribution favouring the transition between end-states A and B but creating a bottleneck for the transition to endstate C. The round-trip time is minimised but end-state C would not be sampled in finite simulation time. In the case of a slowly adapting environment (e.g. a protein binding pocket), some regions of the phase space may only become accessible after very long simulation times. Thus, this information cannot be incorporated in an optimisation scheme that is practically feasible, resulting in a non-optimal solution for the globally minimised round-trip time. Second, for

9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 33

very rough potential-energy landscapes (e.g. a ligand in complex with a protein), the main challenge can be to find a replica distribution that allows any round-trips at all. Therefore, instead of a globally optimal solution, one rather desires a fast and robust scheme, which identifies the bottlenecks of the stationary current J in an efficient manner and distributes a suitable number of replicas M reasonably within the identified round-trip bottlenecks. This is particularly true for RE-EDS simulations, but may also apply to other RE variants. The multi-state local round-trip time optimisation (N-LRTO) algorithm presented here involves the following key ideas: • Replicas are added in an iterative procedure until the desired total number of replicas M is reached or a stopping criterion is fulfilled. • The replicas of the previous iteration step k are preserved in the iteration step k + 1. • At iteration step k + 1, ∆Mk replicas are assigned to the Mk − 1 s-ranges in between existing replicas, using the measured fraction fk as an indicator for the round-trip bottlenecks. • The new ∆Mk replicas are distributed within the assigned s-range such that the stationary current Jk+1 is maximised locally. An alternative local round-trip optimisation algorithm was previously proposed by Nadler et al., 27 which relies on a local measure for the flow instead of f given in Eq. (5) or (11). In the following, a formally exact definition of the proposed N-LRTO algorithm is given. Let Sk be the ordered set of s-values at iteration k containing Mk replicas used for the kth iteration step of the algorithm and fk (si ) the corresponding fraction according to Eq. (5) or (11). The ordered set in the next iteration step Sk+1 shall fulfil, !

Sk+1 = Sk ∪ ∆Sk

10

ACS Paragon Plus Environment

(12)

Page 11 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

with !

(13)

Sk ∩ ∆Sk = ∅.

∆Sk corresponds to the set of ∆Mk newly added replicas. For the moment, the choice of ∆Sk (i.e. the distribution of the new replicas over the s-ranges) remains arbitrary. Instead, the local Jk+1 optimisation problem is treated first. Let us define a piecewise linear continuation fˆk (s), s ∈ [s1 , sMk ] of the measured fraction fk (si ) as follows, fk (⌈s⌉k ) − fk (⌊s⌋k ) fˆk (s) := fk (⌊s⌋k ) + (s − ⌊s⌋k ) ⌈s⌉k − ⌊s⌋k (14)

= fk (⌊s⌋k ) + (s − ⌊s⌋k )hk (s),

where the symbols ⌊.⌋k and ⌈.⌉k correspond to the next smaller or larger si ∈ Sk . If one assumes that the acceptance probability between two replicas WS (u, v) ∈ [0, 1] can locally (i.e. within the range {u, v} ∈ (si , si+1 )) be expressed as a function of ∆fˆ(u, v) := fˆ(v) − fˆ(u) ∈ [0, 1] only, the unknown acceptance probability can be written as a power series expansion on a compact support as follows,

WS (u, v) =

∞ X

αm (1 − ∆fˆ)m ,

(15)

m=1

with

P∞

m=1

! ! ! αm = 1, assuming boundary conditions WS (∆fˆ = 0) = 1 and WS (∆fˆ = 1) = 0.

Using Eqs. (8), (14) and (15), the local J-optimisation problem for new sl ∈ ∆Sklocal := ∆Sk ∩ (si , si+1 ) can be reformulated as, ~ s ∈∆S local J = ∇ l k

∞ X

mαm sl hk (sl ) 1 − (sl − sl−1 )

m=1



∞ X

msl hk (sl ) 1 − (sl+1 − sl )

m=1

11

m−1

!

=0

ACS Paragon Plus Environment

m−1 (16)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 33

Therefore, J is locally maximized (or τ minimized, respectively), if !

sl − sl−1 = sl+1 − sl ∀sl ∈ ∆Sklocal ,

(17)

in case of a linearly approximated fˆ as given in Eq. (14). Or in other words, if the newly added replicas (sl ) are distributed equidistantly. Now the question remains of how to assign a certain number of new replicas to a specific s-range: In order to decide whether or not a new replica sl should be placed within (si , si+1 ), we propose to use ∆fˆ as an indicator for the round-trip bottlenecks and, thus, to reduce the original optimisation problem for iteration step k + 1 from h i min τk+1 (Sk , fˆk , ∆Mk )

(18)

to the following MinMax problem,

min ∆Sk



max

si ,si+1 ∈Sk+1

h

fˆk (si+1 ) − fˆk (si )

i ,

(19)

which optimises J locally (i.e. fulfils Eq. (17)) . The later problem can be solved straightforward on a computer for every iteration step, leading to a solution for ∆Sk . Note that the constraints for constant average acceptance probabilities (as in Eq. (9)) were not imposed in the derivation of N-LRTO. This is most likely an important advantage of N-LRTO for the application with RE-EDS, because it allows the algorithm to deal better with pronounced RE-acceptance gap regions and with the high uncertainty of the underlying f estimator. Pronounced gap regions are likely to be present in systems with a slowly adapting environment such as protein binding pockets. In addition, the preservation of the previous set si ∈ Sk in the next iteration step should allow the algorithm to reduce the occurrence of fluctuations in the s-distribution. 21 This in turn will reduce the number of necessary iteration steps. In combination, these features make the algorithm more robust and efficient.

12

ACS Paragon Plus Environment

Page 13 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

On the contrary, the local optimisation in the absence of pronounced round-trip bottlenecks (i.e. J >> 0) may result in longer round-trip times, especially if too many replicas are added. Therefore, a good balance between the number of iterations steps and the number of replicas added at each step ∆Mk is crucial for efficient production simulations with limited computational resources. This balance will depend on the system studied, which makes it difficult to address the issue in general. Instead, a practically feasible stopping criterion for the iteration process is defined in the Methods section 3.2.

3 3.1

Methods Workflow

A brief overview of the general RE-EDS workflow for a system of interest is shown in Fig. 2. In a first step, the energy offsets are chosen based on a short RE-EDS simulation with default parameters (EiR = 0, logarithmically distributed s-values). In this work, the PEOE algorithm 19 was used. In a second step, the s-distribution is optimised iteratively using for example one of the three round-trip optimisation schemes (1-GRTO, N-GRTO or N-LRTO). In a last step, a production run is performed in order to obtain the free-energy differences ∆G using the estimated energy offsets and optimal distribution of replicas in s-space.

3.2

Extremal s-values, state assignment and stopping criterion for round-trip time optimisation

For all round-trip time optimisation schemes, an initial set of M0 logarithmically distributed s-values was employed as starting configuration, as used in our previous work. 19 The lower and upper bounds were set to smin = 0.001 and smax = 1.0, respectively. This choice of smin ensures that the system is in “undersampling” 19 and thus sufficient transitions between every enveloped end-state occur. The choice of smax guarantees a physically realistic reference state.

13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 14 of 33

Page 15 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

procedure in combination with the aforementioned assignment of additional N − 1 replicas to obtain a fast and reliable solution for the s-distribution in practice.

3.3

Simulation details

RE-EDS simulations of nine tetrahydroisoquinoline inhibitors of phenylethanolamine N methyltransferase (PNMT) were performed in water and in complex with PNMT using a modified version of the GROMOS software package. 28,29 The set of ligands is shown in Fig. 3. Ligand 2 in Ref. 11 was not considered in this study as it is differently charged than the other ligands which causes problems. 19 The details on the derivation of the force-field parameters based on the GROMOS 53A6 force field 30 are specified in Ref. 31. The topologies are provided in the supplementary material of Ref. 11. All relevant system parameters were chosen identical to Refs. 11 and 19. Therefore, the contribution of the excluded atoms to the reaction field was not considered as done in older versions of GROMOS in order to compare the results. Simulations were carried out under NPT conditions with periodic boundaries. Newton’s equations of motion were integrated using the leap-frog scheme 32 with a time step of 2 fs. The simple-point-charge (SPC) water model 33 was used. To keep the temperature close to its reference value, weak coupling to a temperature bath with a relaxation time of 0.1 ps was applied. 34 The solute(s) and the solvent were coupled to separate temperature baths. The pressure was maintained close to 1.013 bar (1 atm) by weak coupling to a pressure bath with a relaxation time of 0.5 ps and an isothermal compressibility of 4.575 · 10−4 (kJ mol−1 nm−3 )−1 . A twin-range cutoff scheme with 0.8 nm and 1.4 nm was used for the nonbonded interactions. A reaction field force 35 with a relative dielectric permittivity of 61 36 was applied to treat electrostatic interactions beyond the long-range cutoff. Bond lengths in the peptide and in the solvent were constrained to ideal values using the SHAKE algorithm 37 with a tolerance of 10−4 nm. The centre of mass motion was stopped every 20 ps. The energies were written out every 20 fs for analysis. The free-energy differences were calculated using the GROMOS analysis 15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 33

program 38 dfmult with data obtained at s = 1. (R)

(R) NH 2

NH 2

1

NH 2 4

3

NH 2

O2N

5 O

(S) O2N

NH 2 6

(S)

OH H 2N O

NH 2 H 2N

S O

7

O

NH 2

S O

8

(R)

OH

(S) O

O2N

NH 2 9

NH 2 10

Figure 3: Nine tetrahydroisoquinoline derivates, which are inhibitors of phenylethanolamine N -methyltransferase (PNMT).

3.3.1

Ligands in water

The starting conformations were taken from Ref. 11. The ligands were solvated in a periodic, cubic box with 975 SPC water molecules and a dual-topology approach as in Ref. 11 was used. The nine energy offsets (listed in Table 1) were taken from Ref. 19, i.e. they were determined using PEOE scheme. In total, three different RE-EDS simulations were performed with s-distributions determined using the 1-GRTO, N-GRTO or N-LRTO algorithms. For the multi-state schemes, each end-state was weighted equally in the calculation of fN , i.e. wi = 1/N . For the first optimisation step (step 0), four s-values were distributed logarithmically between 1 and 0.001 and the simulation length was 0.1 ns. Two iteration steps were carried out, each time adding four additional replicas and doubling the simulation time (i.e. 0.2 ns for step 1, and 0.4 ns for step 2). For the GRTO algorithms, three additional iteration steps were performed (each of 0.4 ns length) in order to evaluate the convergence of the s-distribution. The resulting s-distributions were restricted to 12 replicas in order to serve as a test system for the three different s-distribution algorithm. RE attempts were carried out every 20 fs. Production runs were performed for 10 ns.

16

ACS Paragon Plus Environment

Page 17 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

R Table 1: Energy offsets E¯i,RE for nine ligands in water or in complex with PNMT estimated using the parallel energy-offset estimation (PEOE) 19 scheme. For the ligands in water, the values were taken from Ref. 19.

Ligand i 1 3 4 5 6 7 8 9 10 3.3.2

In water 19 In complex with PNMT R R ¯ Ei,RE [kJ/mol] E¯i,RE [kJ/mol] 0 0 −1.9 −13.9 −3.2 −7.7 7.4 −13.9 −36.8 −66.1 −335.9 −351.0 −383.0 −406.3 −8.7 −25.2 −123.8 −122.9

Ligands in complex with PNMT

The starting conformation was taken from Ref. 11. The ligand-protein complex was solvated in a periodic, cubic box with 13303 SPC water molecules. In contrast to Ref. 11, a dualtopology approach was used. The energy offsets were estimated using the PEOE scheme from an initial RE-EDS simulation of 2.5 ns length with all energy offsets set to zero (i.e. EiR = 0) and 21 logarithmically distributed s-values between 1 and 0.001. RE attempts were carried out every 10 ps. The threshold potential energy was set to Vth = 0 and the threshold fraction and convergence radius were set to 1.0. The resulting energy offsets are listed in Table 1. The s-distribution was optimised using the N-GRTO or N-LRTO algorithms. Each endstate was weighted equally in the calculation of fN , i.e. wi = 1/N . For the first optimisation step (step 0), eight s-values were distributed logarithmically between 1 and 0.001 and the simulation length was 0.2 ns. Two iteration steps were carried out, each time doubling the number of replicas and doubling the simulation time (i.e. 0.4 ns and 16 replicas for step 1, and 0.8 ns and 32 replicas for step 2). The final s-distribution contained 64 replicas. For the N-GRTO algorithm, two additional iteration steps were performed with 64 replicas (each of 1.6 ns length) in order to evaluate the convergence of the s-distribution. RE attempts were

17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

carried out every 20 fs. The production run using the s-distribution determined by N-LRTO was performed for 10 ns.

4

Results and Discussion

4.1

Comparison of 1-GRTO, N-GRTO and N-LRTO using nine ligands in water

The distributions of the replicas or s-values, respectively, obtained using the three different algorithms 1-GRTO, N-GRTO and N-LRTO are shown in Fig. 4. For the 1-GRTO scheme, fluctuations in the distribution were observed between the iteration steps. This means that the replicas became too concentrated in a bottleneck region at one iteration step, thus creating a new bottleneck in another region in s-space. This led to insufficient sampling of some end-states in the RE-EDS simulation using the s-distribution at step 2, resulting in substantial deviations of the estimated free-energy differences from the TI reference data (Fig. 5A and Table S2 in the Supporting Information). Note that the occupancy of an endstate does of course not only depend on the s-distribution, but also on an accurate choice of the energy offsets. Thus, an equal occupancy of all end-states cannot be expected in any case. After two more iteration steps with a constant number of replicas, the distribution was more converged, resulting in better although still not sufficient sampling of all end-states (Fig. S1 in the Supporting Information). In a slowly adapting environment, it may also be possible that no converged solution can be found in a reasonable simulation time. Using the N-GRTO scheme, no fluctuations were observed. The free-energy differences obtained from the RE-EDS simulation using the s-distribution at iteration step 2 were in excellent agreement with the TI reference data (Fig. 5B and Table S2 in the Supporting Information). The distribution did not change significantly when more iteration steps were carried out (with a constant number of replicas) and thus the results obtained from the subsequent RE-EDS simulation were similar (Fig. S1 in the Supporting Information). 18

ACS Paragon Plus Environment

Page 18 of 33

Page 19 of 33

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 20 of 33

Page 21 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The s-distribution obtained at iteration step 2 using the N-LRTO algorithm is similar to the one estimated by N-GRTO. However, in the subsequent RE-EDS simulation using the N-LRTO s-distribution, ligand 8 was less often sampled, which affected the accuracy of the estimated free-energy differences involving ligand 8 (Fig. 5C and Table S2 in the Supporting Information). As water is a fast adapting environment, there are no pronounced roundtrip bottlenecks within the gap region (the system could be simulated successfully using a manually adapted s-distribution 19 ). This allows a successful global optimisation. For systems with a fast adapting environment, the N-GRTO scheme is therefore the algorithm of choice. In a slowly adapting environment like a protein binding pocket, on the other hand, pronounced bottlenecks likely exist, which will hamper a global optimisation and favour a local one.

4.2

Comparison of N-GRTO with N-LRTO using nine ligands in complex with PNMT

While the successful estimation of free-energy differences between the nine ligands in water was possible after manual adjustment of the s-distribution, 19 the same approach failed for the ligands in complex with PNMT. For example, with 24 logarithmically distributed s-values, no round-trips at all were observed (Fig. S2 in the Supporting Information). Increasing the number of replicas did not improve the number of round-trips (data not shown). These observations highlighted the existence of pronounced round-trip bottlenecks in the gap region of s-space in such a system, and thus the need for an automated optimisation of the replica distribution. The evolution of the s-distribution using the N-LRTO algorithm is shown in the left panel of Fig. 6. The numerical values of the final 64 s-values are listed in Table S1 in the Supporting Information. The most pronounced bottleneck was identified between s = 0.0179 and s = 0.0332 with nearly half of the replicas being located in this region of s-space (i.e. 28 out of 64 replicas). In the subsequent RE-EDS simulation using this s-distribution, all 21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

end-states were sampled sufficiently (Fig. 6 and Fig. S3 in the Supporting Information) and the resulting free-energy differences agreed well with the TI reference data (Figs. 6 and 7, and Table 2). The agreement was in many cases better than for the previous results with two-state EDS 11 (Table 2). Nevertheless, differences in occupancy between the different endstates were observed (Fig. 6). Ligand 7 was sampled predominantly, whereas the occupancy for ligands 6, 9 and 10 was comparatively low. Extension of the simulations to 15 ns did not alter the results substantially (data not shown). This could (at least partly) be caused by not entirely optimal energy offsets as these were estimated from a RE-EDS run with only 21 logarithmically distributed s-values. Therefore, for the application of the RE-EDS approach to other proteins systems where this issue may be more problematic, a second PEOE step could be envisioned using the optimised s-distribution. For comparison, the evolution of the s-distribution using the N-GRTO algorithm is shown in Fig. S4 in the Supporting Information. Significant changes in the proposed s-distribution can be observed for the additional iteration steps with a constant number of replicas, indicating that the distribution is not converged. Therefore, a converged solution may become computationally expensive to obtain. In addition, a high concentration of s-values close to 1 was observed, which turned out to be a numerical artifact. Analyses showed that this is caused by a poor sampling of ligand 10, which made its contribution to fN very unreliable and sensitive towards numerical artifacts at the boundary. This undesired effect highlights the vulnerability of the GRTO algorithm in case of a high uncertainty of the underlying estimator fN (si ).

4.3

Relative binding free energies

The relative binding free energies were calculated from the free-energy differences obtained from the RE-EDS simulations using Eq. (1) and compared to the experimental counterparts calculated using Eq. (2) (Fig. 8 and Table S4 in the Supporting Information). For ligands 6 and 8, the experimentally available binding free energies are for the other enantiomer, 22

ACS Paragon Plus Environment

Page 22 of 33

Page 23 of 33

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2: Free-energy differences ∆Gji obtained from a RE-EDS simulation of the nine ligands in complex with PNMT after 10 ns. The distribution of the 64 s-values obtained at iteration step 3 using the N-LRTO algorithm. The TI reference data were taken from Ref. 31. The values are shown graphically in Fig. 7. The errors correspond to statistical uncertainties. 8 Pair i-j 1-3 1-4 1-5 1-6 1-7 1-8 1-9 1-10 3-4 3-5 3-6 3-7 3-8 3-9 3-10 4-5 4-6 4-7 4-8 4-9 4-10 5-6 5-7 5-8 5-9 5-10 6-7 6-8 6-9 6-10 7-8 7-9 7-10 8-9 8-10 9-10

TI ∆Gji [kJ/mol] −1.0 ± 1.5 −4.5 ± 1.5 −3.5 ± 1.6 −48.1 ± 1.9 −354.5 ± 1.4

EDS ∆Gji 11 [kJ/mol] −4.4 ± 1.4 −8.6 ± 1.6 −4.0 ± 0.9 −60.1 ± 0.8 −352.0 ± 2.0

−110.9 ± 1.5

−126.6 ± 1.5

−396.2 ± 2.2 −8.3 ± 1.9

−399.1 ± 1.3 −9.0 ± 1.2

−355.6 ± 2.0

−345.9 ± 1.4

−103.3 ± 1.5 −41.3 ± 1.8

−122.6 ± 1.1 −34.1 ± 1.5

−8.3 ± 1.2

−9.5 ± 06

−67.5 ± 2.0 −45.8 ± 1.9

−48.8 ± 1.8 −51.6 ± 1.2

24

RE-EDS ∆Gji [kJ/mol] −5.7 ± 0.1 −2.5 ± 0.1 −5.9 ± 1.2 −53.2 ± 0.2 −356.0 ± 0.1 −402.6 ± 0.1 −12.9 ± 0.2 −109.3 ± 0.2 3.2 ± 0.1 −0.1 ± 1.3 −47.5 ± 0.2 −350.2 ± 0.1 −396.8 ± 0.1 −7.2 ± 0.2 −103.5 ± 0.2 −3.4 ± 1.3 −50.7 ± 0.2 −353.5 ± 0.1 −400.0 ± 0.1 −10.4 ± 0.2 −106.8 ± 0.2 −47.4 ± 1.3 −350.1 ± 1.3 −396.7 ± 1.3 −7.0 ± 1.3 −103.4 ± 1.3 −302.7 ± 0.2 −349.2 ± 0.2 40.3 ± 0.3 −56.0 ± 0.3 −46.6 ± 0.1 343.1 ± 0.2 246.7 ± 0.2 389.7 ± 0.2 293.3 ± 0.2 −96.4 ± 0.3

ACS Paragon Plus Environment

Page 24 of 33

Page 25 of 33

whereas for ligands 4 and 10 the racemic mixtures were measured in experiment. The difference in binding free energy between the enantiomers can be up to 9 kJ/mol as found for ligand 3 (see Table 1 in Ref. 31). In addition, the values for ligands 3, 4, 9 and 10 were obtained for bovine PNMT, whereas the other experimental values and the calculated values are for human PNMT. The largest deviations between the calculated and experimental ∆∆Gbind values were observed for pairs involving ligand 10 (shown as closed symbols in Fig. 8). The RMSE over the whole set of 36 relative binding free energies is 9.2 kJ/mol (or 2.2 kcal/mol). If the values involving ligand 10 are not considered, the RMSE is 4.8 kJ/mol (or 1.2 kcal/mol).

40

30

10

bind

[kJ/mol]

20

Calc. ∆∆G

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

0

-10

-20

-30 -30

-20

-10

0 10 bind Exp. ∆∆G [kJ/mol]

20

30

40

Figure 8: Experimental versus calculated relative binding free energies ∆∆Gbind at 1 atm ji and 298 K. Values for ligand pairs where one or both were measured as racemic mixture are shown as triangles. Values for ligand pairs where the experimental value for one or both ligands is for the other enantiomer are shown in red. Values involving ligand 10 are shown as filled symbols. The original experimental data can be found in Refs. 39–41. The numerical values are given in Table S4 in the Supporting Information. The black solid line corresponds to y = x and is meant to guide the eye. The dashed lines indicate an interval of ± 4.184 kJ/mol (i.e. ± 1 kcal/mol).

25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

5

Conclusion

RE-EDS allows the efficient estimation of free-energy differences between multiple end-states from a single simulation, given sufficiently accurate energy offsets and a suitable distribution of the replicas in s-space. In this work, we introduced a new scheme for multi-state local round-trip time (N-LRTO) optimisation for replica-exchange simulations. In contrary to previously proposed global optimisation schemes (1-GRTO and N-GRTO), the N-LRTO approach does not impose a constraint for constant average acceptance probabilities between replicas. Thus, the algorithm may be better suited for systems with pronounced regions of very low acceptance probability, for example systems with slowly adapting environments such as protein binding pockets. Although developed for RE-EDS simulations, the algorithm could be potentially interesting for other replica-exchange variants. The N-LRTO algorithm was tested on a set of nine small-molecule inhibitors of PNMT in water and compared to the global schemes. The number of replicas was restricted to 12. Reasonable distributions of the 12 replicas in s-space were obtained using both multi-state schemes N-GRTO and N-LRTO, with N-GRTO being superior for this system with a fast adapting environment. This is likely due to the low uncertainty in the underlying estimator, which allows a converged global round-trip time optimisation. For the one-state algorithm 1-GRTO, on the other hand, fluctuations in the s-distribution between iteration steps were observed, resulting in insufficient sampling of some of the end-states. The multi-state algorithms were further tested for the nine ligands in complex with PNMT. In this system with a high uncertainty of the underlying estimator, the global N-GRTO algorithm was found to be vulnerable to numerical artifacts due to insufficient sampling of some end-states during the s-distribution optimisation. Using the N-LRTO algorithm, on the other hand, an optimal s-distribution was obtained that allowed sufficient sampling of all end-states in 10 ns using 64 replicas. The resulting free-energy differences were in good agreement with the TI reference data. Thus, for complex systems with a slowly adapting environment such as protein binding pockets, the N-LRTO algorithm is considered 26

ACS Paragon Plus Environment

Page 26 of 33

Page 27 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

to be the method of choice as it offers a more robust optimisation. Using RE-EDS, 36 free-energy differences between nine ligands in complex with PNMT were obtained from a single simulation of 10 ns. Thus, RE-EDS presents a computationally efficient approach for the estimation of relative binding free energies. In the current setup, the production run required 64 x 10 ns = 640 ns of simulation time with an additional 33.6 ns for the parameter optimisation. For comparison, TI simulations with 15 λ-points and 2-3 ns at each λ-point would require 1620 ns for all 36 pairs of ligands. If both the forward and backward directions would be calculated for TI in order to determine the hysteresis, the total simulation time would be doubled. In (RE-)EDS, on the other hand, multiple transitions between the end-states occur during a simulation. The system-dependent and thus user-defined parameters in RE-EDS (using the proposed schemes to estimate energy offsets and optimise the replica distribution) are for a given system: the number of initial replicas, the number of replicas added per iteration step, the choice of the lower (and upper) s-bound and the length of the production simulation. As the ligands in complex with PNMT are not a trivial system, the parameter choices used in this study represents a good starting point. The appropriate length of the production run can further be determined by monitoring the convergence of the estimated free-energy differences and the occupation number of the individual end-states. In the future, we plan to apply the RE-EDS methodology to other protein-ligand systems, which will show the robustness of the proposed schemes.

Acknowledgement The authors gratefully acknowledge financial support by the Swiss National Science Foundation (Grant Number 200021-159713) and by ETH Zürich (ETH-08 15-1).

27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting Information Available Additional Tables S1 - S4 and Figures S1 - S4.

References (1) Chipot, C.; Pohorille, A. Free Energy Calculations; Springer-Verlag: Berlin Heidelberg, 2007. (2) Christ, C. D.; Mark, A. E.; van Gunsteren, W. F. Basic Ingredients of Free Energy Calculations: A Review. J. Comput. Chem. 2010, 31, 1569–1582. (3) Kirkwood, J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys. 1935, 3, 300–313. (4) Zwanzig, R. W. High Temperature Equation of State by a Perturbation Method. I. Nonpolar Gases. J. Chem. Phys. 1954, 22, 1420–1426. (5) Liu, S.; Wu, Y.; Lin, T.; Abel, R.; Redmann, J. P.; Summa, C. M.; Jaber, V. R.; Lim, N. M.; Mobley, D. L. Lead Optimization Mapper: Automating Free Energy Calculations for Lead Optimization. J. Comput. Aided Mol. Des. 2013, 27, 755–770. (6) Han, K.-K. A New Monte Carlo Method for Estimating Free Energy and Chemical Potential. Phys. Lett. A 1992, 165, 28–32. (7) Christ, C. D.; van Gunsteren, W. F. Enveloping Distribution Sampling: A Method to Calculate Free Energy Differences from a Single Simulation. J. Chem. Phys. 2007, 126, 184110. (8) Christ, C. D.; van Gunsteren, W. F. Multiple Free Energies from a Single Simulation: Extending Enveloping Distribution Sampling to Non-Overlapping Phase-Space Distributions. J. Chem. Phys. 2008, 128, 174112. 28

ACS Paragon Plus Environment

Page 28 of 33

Page 29 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(9) Christ, C. D.; van Gunsteren, W. F. Simple, Efficient, and Reliable Computation of Multiple Free Energy Differences from a Single Simulation: A Reference Hamiltonian Parameter Update Scheme for Enveloping Distribution Sampling (EDS). J. Chem. Theory Comput. 2009, 5, 276–286. (10) Christ, C. D.; Van Gunsteren, W. F. Comparison of Three Enveloping Distribution Sampling Hamiltonians for the Estimation of Multiple Free Energy Differences from a Single Simulation. J. Comput. Chem. 2009, 30, 1664–1679. (11) Riniker, S.; Christ, C. D.; Hansen, N.; Mark, A. E.; Nair, P. C.; van Gunsteren, W. F. Comparison of Enveloping Distribution Sampling and Thermodynamic Integration to Calculate Binding Free Energies of Phenylethanolamine N-Methyltransferase Inhibitors. J. Chem. Phys. 2011, 135, 024105. (12) Lin, Z.; Liu, H.; Riniker, S.; van Gunsteren, W. F. On the Use of Enveloping Distribution Sampling (EDS) to Compute Free-Enthalpy Differences Between Different Conformational States of Molecules: Application to 310 -, α-, and π-Helices. J. Chem. Theory Comput. 2011, 7, 3884–3897. (13) Hansen, N.; Dolenc, J.; Knecht, M.; Riniker, S.; van Gunsteren, W. F. Assessment of Enveloping Distribution Sampling to Calculate Relative Free Enthalpies of Binding for Eight Netropsin–DNA Duplex Complexes in Aqueous Solution. J. Comput. Chem. 2012, 33, 640–651. (14) Hansmann, U. H. E. Parallel Tempering Algorithm for Conformational Studies of Biological Molecules. Chem. Phys. Lett. 1997, 281, 140–150. (15) Sugita, Y.; Kitao, A.; Okamoto, Y. Multidimensional Replica-Exchange Method for Free-Energy Calculations. J. Chem. Phys. 2000, 113, 6042. (16) Woods, C. J.; Essex, J. W.; King, M. A. The Development of Replica-Exchange-Based Free-Energy Methods. J. Phys. Chem. B 2003, 107, 13703–13710. 29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(17) Lee, J.; Miller, B. T.; Damjanovic, A.; Brooks, B. R. Constant pH Molecular Dynamics in Explicit Solvent with Enveloping Distribution Sampling and Hamiltonian Exchange. J. Chem. Theory Comput. 2014, 10, 2738–2750. (18) Lee, J.; Miller, B. T.; Damjanovic, A.; Brooks, B. R. Enhancing Constant-pH Dimulation in Explicit Solvent with a Two-Dimensional Replica Exchange Method. J. Chem. Theory Comput. 2015, 11, 2560–2574. (19) Sidler, D.; Schwaninger, A.; Riniker, S. Replica-Exchange Enveloping Distribution Sampling (RE-EDS): A Robust Method to Estimate Multiple Free-Energy Differences from a Single Simulation. J. Chem. Phys. 2016, 145, 154114. (20) Trebst, S.; Huse, D. A.; Troyer, M. Optimizing the Ensemble for Equilibration in BroadHistogram Monte Carlo Simulations. Phys. Rev. E 2004, 70, 046701. (21) Katzgraber, H. G.; Trebst, S.; Huse, D. A.; Troyer, M. Feedback-Optimized Parallel Tempering Monte Carlo. J. Stat. Mech.: Theory Exp. 2006, 2006, P03018. (22) Trebst, S.; Troyer, M.; Hansmann, U. H. Optimized Parallel Tempering Simulations of Proteins. J. Chem. Phys. 2006, 124, 174903. (23) Nadler, W.; Hansmann, U. H. Generalized Ensemble and Tempering Simulations: A Unified View. Phys. Rev. E 2007, 75, 026109. (24) Chen, Y.-G.; Hummer, G. Slow Conformational Dynamics and Unfolding of the Calmodulin C-Terminal Domain. J. Am. Chem. Soc. 2007, 129, 2414–2415. (25) Sugita, Y.; Okamoto, Y. Replica-Exchange Molecular Dynamics Method for Protein Folding. Chem. Phys. Lett. 1999, 314, 141–151. (26) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087– 1092. 30

ACS Paragon Plus Environment

Page 30 of 33

Page 31 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(27) Nadler, W.; Meinke, J. H.; Hansmann, U. H. Folding Proteins by First-Passage-TimesOptimized Replica Exchange. Phys. Rev. E 2008, 78, 061905. (28) Schmid, N.; Christ, C. D.; Christen, M.; Eichenberger, A. P.; van Gunsteren, W. F. Architecture, Implementation and Parallelization of the GROMOS Software for Biomolecular Simulation. Comp. Phys. Comm. 2012, 183, 890–903. (29) Riniker, S.; Christ, C. D.; Hansen, H. S.; Hünenberger, P. H.; Oostenbrink, C.; Steiner, D.; van Gunsteren, W. F. Calculation of Relative Free Energies for LigandProtein Binding, Solvation, and Conformational Transitions Using the GROMOS Software. J. Phys. Chem. B 2011, 115, 13570–13577. (30) Oostenbrink, C.; Villa, A.; Mark, A. E.; Van Gunsteren, W. F. A Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS ForceField Parameter Sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656–1676. (31) Nair, P. C.; Malde, A. K.; Mark, A. E. Using Theory to Reconcile Experiment: The Structural and Thermodynamic Basis of Ligand Recognition by Phenylethanolamine N-Methyltransferase (PNMT). J. Chem. Theory Comput. 2011, 7, 1458–1468. (32) Hockney, R. W. The Potential Calculation and Some Applications. Methods Comput. Phys. 1970, 9, 135–211. (33) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Hermans, J. In Intermolecular Forces; Pullman, B., Ed.; Reidel: Dordrecht, The Netherlands, 1981; pp 331–342. (34) Berendsen, H. J.; Postma, J. v.; van Gunsteren, W. F.; DiNola, A.; Haak, J. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684–3690. (35) Tironi, I. G.; Sperb, R.; Smith, P. E.; van Gunsteren, W. F. A Generalized Reaction

31

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Field Method for Molecular Dynamics Simulations. J. Chem. Phys. 1995, 102, 5451– 5459. (36) Heinz, T. N.; van Gunsteren, W. F.; Hünenberger, P. H. Comparison of Four Methods to Compute the Dielectric Permittivity of Liquids from Molecular Dynamics Simulations. J. Chem. Phys. 2001, 115, 1125–1136. (37) Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J. Comput. Phys. 1977, 23, 327–341. (38) Eichenberger, A. P.; Allison, J. R.; Dolenc, J.; Geerke, D. P.; Horta, B. A. C.; Meier, K.; Oostenbrink, C.; Schmid, N.; Steiner, D.; Wang, D.; van Gunsteren, W. F. The GROMOS++ Software for the Analysis of Biomolecular Simulation Trajectories. J. Chem. Theory Comput. 2011, 7, 3379–3390. (39) Grunewald, G. L.; Dahanukar, V. H.; Teoh, B.; Criscione, K. R. 3,7-Disubstituted1,2,3,4-tetrahydroisoquinolines Display Remarkable Potency and Selectivity as Inhibitors of Phenylethanolamine N-Methyltransferase versus the α2 -Adrenoceptor1a . J. Med. Chem. 1999, 42, 1982–1990. (40) Grunewald, G. L.; Seim, M. R.; Regier, R. C.; Martin, J. L.; Gee, C. L.; Drinkwater, N.; Criscione, K. R. Comparison of the Binding of 3-Fluoromethyl-7-sulfonyl1,2,3,4-tetrahydroisoquinolines with Their Isosteric Sulfonamides to the Active Site of Phenylethanolamine N-Methyltransferase 1. J. Med. Chem. 2006, 49, 5424–5433. (41) Gee, C. L.; Drinkwater, N.; Tyndall, J. D.; Grunewald, G. L.; Wu, Q.; McLeish, M. J.; Martin, J. L. Enzyme Adaptation to Inhibitor Binding: A Cryptic Binding Site in Phenylethanolamine N-Methyltransferase. J. Med. Chem. 2007, 50, 4845–4853.

32

ACS Paragon Plus Environment

Page 32 of 33

Page 33 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

33

ACS Paragon Plus Environment