Extended Thermodynamic Integration: Efficient Prediction of Lambda

Aug 5, 2016 - Thermodynamic integration (TI) is one of the most commonly used free-energy calculation methods. The derivative of the Hamiltonian with ...
3 downloads 0 Views 17MB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

Extended Thermodynamic Integration: efficient prediction of lambda derivatives at non-simulated points Anita de Ruiter, and Chris Oostenbrink J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.6b00458 • Publication Date (Web): 05 Aug 2016 Downloaded from http://pubs.acs.org on August 21, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Extended  Thermodynamic   Integration:  efficient  prediction  of   lambda  derivatives  at  non-­‐simulated   points   Anita  de  Ruiter  &  Chris  Oostenbrink     Institute  for  Molecular  Modeling  and  Simulation   University  of  Natural  Resources  and  Life  Sciences  (BOKU)   Vienna,  Austria                                                    

 

Last  modified:  04.08.16    

ACS Paragon Plus Environment

  1  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 27

Abstract   Thermodynamic   Integration   (TI)   is   one   of   the   most   commonly   used   free-­‐energy   calculation   methods.  The  derivative  of  the  Hamiltonian  with  respect  to  lambda,  ,  is  determined  at   multiple   λ-­‐points.   Because   a   numerical   integration   step   is   necessary,   high   curvature   regions   require  simulations  at  densely  spaced  λ-­‐points.  Here,  the  principle  of  extended  TI  is  introduced,   where    values  are  predicted  at  non-­‐simulated  λ-­‐points.  Based  on  three  model  systems,   it   is   shown   that   extended   TI   requires   significantly   fewer   λ-­‐points   than   regular   TI,   to   obtain   similar  accuracy.  

Introduction   Over  the  last  few  decades,  molecular  dynamics  (MD)  simulations  have  taken  an  important  role  in   multiple  research  fields  such  as  physics,  material  sciences,  biotechnology  and  structural  biology.   Whereas  experimental  studies  mostly  give  macroscopic  insights,  MD  simulations  have  the  great   advantage   that   ligands,   proteins,   DNA   and   other   (bio)materials   can   be   studied   at   a   molecular   level.  In  combination  with  free-­‐energy  calculations,  MD  simulations  become  even  more  powerful.   This  is  especially  true  for  the  field  of  drug  design  where  one  can  e.g.  predict  binding  affinities  of   different   ligands   towards   a   protein   or   the   influence   of   a   mutation   in   the   protein   on   ligand   binding.   The   challenge   in   drug   design   is   to   achieve   the   high   accuracy   that   is   required   to   differentiate   the   binding   affinities   of   closely   related   molecules   and   at   the   same   time   keep   the   computational   costs   as   low   as   possible,   since   often   a   large   number   of   ligands   needs   to   be   evaluated.1–4   Also,   the   methods   should   be   easy   to   use   and   require   as   little   human   input   as   possible  to  allow  for  automated  workflows.4     Although  end-­‐point  and  pulling  methods  are  also  applied  for  the  calculation  of  (relative)  binding   free   energies,   the   rigorous   alchemical   free-­‐energy   methods   Thermodynamic   Integration   (TI)5   and  Bennett’s  acceptance  ratio  (BAR)6,7  are  arguably  the  most  appropriate  ones.     TI   determines   the   free   energy   difference   between   a   state   A   and   B   by   making   use   of   multiple   intermediate  states  as  defined  by  a  coupling  parameter  λ.  The  Hamiltonian  is  dependent  on  λ  and   results   in   the   Hamiltonian   for   state   A   when   λ=0   and   state   B   when   λ=1.   Simulations   are   performed   at   several   λ   values   and   the   ensemble   averages   of   the   derivative   of   the   Hamiltonian   with  respect  to  λ  are  integrated  numerically  to  obtain  the  free  energy  difference.  

 

 

ACS Paragon Plus Environment

2  

 

Page 3 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

! !" ∆𝐺!"

= !

𝑑𝐺 λ = 𝑑λ

!

!

𝜕𝐻 λ 𝜕λ

𝑑λ   !

eq.  1  

 

The  overall  variance  of  a  free  energy  difference  obtained  by  TI  is  dependent  on  the  curvature  of   the    as  function  of  λ.8  This  means  that  smoother  TI  curves  lead  to  more  accurate  free   energy   differences.   The   fact   that   numerical   integration   is   used   for   TI   also   makes   clear   that   regions   of   high   curvature   require   a   more   dense   spacing   of   simulated   λ-­‐points.   Typically   one   starts  with  an  equidistant  spacing  and  adds  additional  simulations  if  required,  as  the  shape  of  the   curve  is  not  known  a  priori.9   BAR  determines  the  free  energy  difference  between  two  values  λ  and  λ+Δλ  by     !"# ∆𝐺!,!!!" = 𝑘! 𝑇 ln

𝑓 𝐻 λ − 𝐻 λ + 𝛥λ + 𝐶 !!!! + 𝐶   𝑓 𝐻 λ + 𝛥λ − 𝐻 𝜆 − 𝐶 !

eq.  2  

 

where   f x =

1 .   1 + e!/!!!

A  self-­‐consistent  solution  towards  C  is  iteratively  sought,  such  that  the  ensemble  averages  in  eq.   2   are   equal   to   each   other,   which   directly   gives   the   free   energy   difference.   Note   that   the   differences   in   the   Hamiltonian   are   calculated   from   simulations   of   both   λ-­‐points   and   that   a   reevaluation   of   the   trajectory   to   obtain   the   Hamiltonian   for   the   other   λ-­‐point   is   necessary.   The   use  of  multiple  simulated  states  simultaneously  slightly  improves  the  efficiency  and  leads  to  the   definition  of  MBAR.10       Several   papers   have   compared   these   two   methods   in   terms   of   efficiency,   accuracy   and   ease   of   use.9,11,12  BAR  seems  to  be  the  most  efficient  one  in  terms  of  the  number  of  simulations  that  are   required,   but   it   requires   more   disk   space   and   post-­‐processing   time.   This   is   especially   true   if   additional   simulations   at   intermediate   λ-­‐points   are   required,   because   the   simulations   of   the   surrounding  λ-­‐points  need  to  be  reevaluated.  On  the  other  side,  TI  requires  more  simulations  in   order  to  get  a  smooth  curve  for  numerical  integration,  but  only  very  little  post-­‐processing  which   is,  moreover,  independent  of  the  neighboring  λ-­‐points.   In  this  paper  we  describe  a  way  to  predict    values  for  surrounding  λ  points.  This  allows   the   calculation   of   accurate   free-­‐energy   differences   with   fewer   λ-­‐points   than   was   previously   possible   with   TI.   Similar   predictions   were   already   performed   by   Tidor   et   al.   to   improve   the  

 

 

ACS Paragon Plus Environment

3  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 27

quadrature  of  the  TI  curve  for  λ-­‐points  deviating  by  0.025  and  0.05  from  the  simulated  values.13   However,  the  potential  energy  function  that  was  used  in  that  work,  was  a  linear  combination  of   those   of   the   two   end   states,   leading   to   the   well-­‐known   end-­‐state   problems.14   Li   and   Yang   also   describe  a  reweighting  method  to  obtain    values  for  other  λ-­‐points,  named  λ-­‐WHAM.15   This  method  was  based  on  a  linear  combination  of  the  end  states,  which  makes  the  calculation  of   the  energy  differences  between  two  λ-­‐points  straightforward.  In  the  current  study,  however,  soft-­‐ core   potentials   are   used,   which   makes   the   prediction   of   derivatives   at   non-­‐simulated   λ-­‐points   more   difficult.   Furthermore,   we   will   make   predictions   over   larger   ranges   of   λ.   In   the   following   sections,   we   will   show   the   theory   behind   this   approach,   which   we   refer   to   as   extended   TI,   its   application  to  three  different  model  systems  in  water,  as  well  as  one  more  challenging  protein-­‐ ligand   system.   Additionally,   the   accuracy,   efficiency   and   user   friendliness   will   be   discussed,   as   well  as  further  possibilities  of  the  code  that  are  still  to  be  exploited.  

Theory   As  was  briefly  pointed  out  in  the  introduction,  the  goal  of  this  study  is  to  optimize  TI  calculations,   such   that   fewer   λ-­‐points   need   to   be   simulated   to   achieve   reliable   free-­‐energy   differences.   The   idea  behind  this  is  shown  in   Figure  1.  From  a  simulation  at  e.g.  λ=0.2  we  will  predict  what  the   ensemble  average  of  the  derivative  of  the  Hamiltonian  with  respect  to  λ  would  be  at  other,  non-­‐ simulated,  λ-­‐points.     We   will   first   focus   on   the   non-­‐bonded   potential   energy   terms,   as   these   functions   are   the   most   complex   due   to   the   soft-­‐core   potentials.14   In   GROMOS,   the   coupling   parameter   (µμ)   of   each   interaction  type,  x,  can  be  set  individually  based  on  a  fourth  order  polynomial  of  λ:16,17   𝜇! = 𝑎! λ! + 𝑏! λ! + 𝑐! λ! + 𝑑! λ + 𝑒!  

eq.  3  

As   an   aside,   the   individual   µμx-­‐values,   may   furthermore   be   specified   separately   for   interactions   between  different  groups  of  atoms.  At  the  moment  there  are  twelve  interaction  types  that  can  be   treated  with  individual  µμ-­‐values,  four  of  which  are  relevant  for  the  non-­‐bonded  potential  energy.  

µμlj  and  µμcrf  scale  the  Lennard  Jones  and  Coulomb-­‐reaction  field  interactions,  respectively,  whereas   µμslj   and   µμscrf   scale   the   corresponding   softness.   Usually,   the   parameters   of   eq.   3   are   chosen   such   that  µμx  =  0  for  λ=0  and  µμx=1  for  λ=1.  

 

 

ACS Paragon Plus Environment

4  

 

Page 5 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The   λ   dependent   Lennard   Jones   potential   energy   (VLJ)   and   Coulomb-­‐reactionfield   interaction   (VCRF),  can  hence  be  written  as   𝑉!" 𝒓, 𝑡, λ =

1 − 𝜇!"

!

! ! ! 𝑉!" 𝑟!" , 𝑡, 𝜇!"# + 𝜇!" 𝑉!" 𝑟!" , 𝑡, 1 − 𝜇!"#  

eq.  4  

!,!

and   𝑉!"# 𝒓, 𝑡, λ =

1 − 𝜇!"#

!

! ! ! 𝑉!"# 𝑟!" , 𝑡, 𝜇!"#$ + 𝜇!"# 𝑉!"# 𝑟!" , 𝑡, 1 − 𝜇!"#$  

eq.  5  

!,!

where    the  sums  run  over  all  interacting  atom  pairs  i,j  ,  rij  is  the  distance  between  the  atoms  i  and  

j  and  n  is  an  integer  as  set  by  the  user  (power  dependence  of  µμ).  𝑉!"!  represents  the  Lennard  Jones   interaction  energy  for  either  end  state  A  or  B:     ! 𝑉!" 𝑟!" ; 𝑡; 𝜇!"# =

! 𝐶!" (𝑖, 𝑗) 1 − 𝐶!! (𝑖, 𝑗) ∙   ! ! ! ! ! 𝛼!" 𝜇!"# 𝐶!"# 𝑖, 𝑗 + (𝑟!" ) 𝛼!" 𝜇!"# 𝐶!"# 𝑖, 𝑗 + (𝑟!" )!

eq.  6  

! where  𝐶!" 𝑖, 𝑗  and  𝐶!! 𝑖, 𝑗  denote   the   Lennard   Jones   parameters   of   atom   i   and   j   in   state   X,   ! ! 𝐶!"# 𝑖, 𝑗 = 𝐶!" 𝑖, 𝑗 /𝐶!! 𝑖, 𝑗    and  αlj  is  the  softness  parameter  of  the  Lennard  Jones  interaction.   ! Similarly,  𝑉!"#  represents   the   Coulomb   reaction   field   interaction   energy   in   state   A   or   B   and   is  

defined  by   ! 𝑉!"# 𝑟!" ; 𝑡; 𝜇!"#$ =

𝑞!! 𝑞!! 4𝜋𝜀!

1 ! 𝛼!"# 𝜇!"#$ +

! 𝑟!"!

!

− !

! !𝐶!" 𝑟!"

! ! 𝛼!"# 𝜇!"#$ + 𝑅!"

! !



1 − ! !𝐶!"   𝑅!"

eq.  7  

  where  𝑞!!  and  𝑞!!  are  the  partial  charges  on  atoms  i  and  j  in  state  X,  ε0  is  the  dielectric  permittivity   of  vacuum,   αcrf  is  the  softness  parameter  of  the  Coulomb-­‐reaction  field  interaction  and   Crf  and   Rrf   are  parameters  of  the  reaction  field  contribution.     We  will  now  slightly  rewrite  eq.  4  into     𝑉!" 𝒓, 𝑡, 𝜆 = 1 − 𝜇!"

!

! ! 𝑉!" 𝑟!" , 𝑡, 𝜇!"# + 𝜇!" !,!

! 𝑉!" 𝑟!" , 𝑡, 1 − 𝜇!"# .  

eq.  8  

!,!

The  derivative  of  the  Lennard  Jones  potential  energy  with  respect  to  λ  is  then  

 

 

ACS Paragon Plus Environment

5  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

𝜕𝑉!" 𝒓, 𝑡, λ = −𝑛 1 − 𝜇!" 𝜕λ

+

!!!

1 − 𝜇!"

!!! ! 𝑉!" 𝑟!" , 𝑡, 𝜇!"# + 𝑛𝜇!" !,!

! !,!

Page 6 of 27

! 𝑉!" 𝑟!" , 𝑡, 1 − 𝜇!"# !,!

! 𝜕𝑉!" 𝑟!" , 𝑡, 𝜇!"# ! − 𝜇!" 𝜕𝜇!"#

! 𝜕𝑉!" 𝑟!" , 𝑡, 1 − 𝜇!"# !,!

𝜕 1 − 𝜇!"#

𝑑𝜇!" 𝑑λ

eq.  9  

𝑑𝜇!"#   𝑑λ

We  can  now  define   ! 𝐸!" 𝒓, 𝑡, 𝜇!"# =

𝑉!!! 𝑟!" , 𝑡, 𝜇!"#

eq.  10  

!,! ! 𝐸!" 𝒓, 𝑡, 𝜇!"# =

! 𝑉!" 𝑟!" , 𝑡, 1 − 𝜇!"# !,!

! 𝑑𝐸!" 𝒓, 𝑡, 𝜇!"# = !,! ! 𝑑𝐸!" 𝒓, 𝑡, 𝜇!"# = !,!

! 𝜕𝑉!" 𝑟!" , 𝑡, 𝜇!"# 𝜕𝜇!"#

 

! 𝜕𝑉!" 𝑟!" , 𝑡, 1 − 𝜇!"# 𝜕𝜇!"#

such  that  eq.  8  and  eq.  9  become   !

𝑉!" 𝒓, 𝑡, 𝜆 = 1 − 𝜇!"

! ! ! 𝐸!" 𝒓, 𝑡, 𝜇!"# + 𝜇!" 𝐸!" 𝒓, 𝑡, 𝜇!"#  

eq.  11  

and   𝜕𝑉!" 𝒓, 𝑡, λ = −𝑛 1 − 𝜇!" 𝜕λ +

!!!

1 − 𝜇!"

!

!!! ! ! 𝐸!" 𝒓, 𝑡, 𝜇!"# + 𝑛𝜇!" 𝐸!" 𝒓, 𝑡, 𝜇!"#

! ! ! 𝑑𝐸!" 𝒓, 𝑡, 𝜇!"# − 𝜇!! 𝑑𝐸!" 𝒓, 𝑡, 𝜇!"#

𝑑𝜇!" 𝑑𝜆

eq.  12  

𝑑𝜇!"#   𝑑𝜆

  Similarly,   we   can   reorganize   eq.   5,   and   define   similar   quantities   for   the   Coulomb-­‐reaction   field   interaction:   ! 𝐸!"# 𝒓, 𝑡, 𝜇!"#$ =

! 𝑉!"# 𝑟!" , 𝑡, 𝜇!"#$

eq.  13  

!,! ! 𝐸!"# 𝒓, 𝑡, 𝜇!"#$ =

! 𝑉!"# 𝑟!" , 𝑡, 1 − 𝜇!"#$ !,!

! 𝑑𝐸!"# 𝒓, 𝑡, 𝜇!"#$ = !,! ! 𝑑𝐸!"# 𝒓, 𝑡, 𝜇!"#$ = !,!

! 𝜕𝑉!"# 𝑟!" , 𝑡, 𝜇!"#$ 𝜕𝜇!"#$

 

! 𝜕𝑉!"# 𝑟!" , 𝑡, 1 − 𝜇!"#$ 𝜕𝜇!"#$

resulting  in:   𝑉!"# 𝒓, 𝑡, λ = 1 − 𝜇!"#

!

! ! ! 𝐸!"# 𝒓, 𝑡, 𝜇!"#$ + 𝜇!"# 𝐸!"# 𝒓, 𝑡, 𝜇!"#$  

eq.  14  

and    

 

 

ACS Paragon Plus Environment

6  

 

Page 7 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

𝜕𝑉!"# 𝒓, 𝑡, λ = −𝑛 1 − 𝜇!"# 𝜕λ +

1 − 𝜇!"#

!!!

!

!!! ! ! 𝐸!"# 𝒓, 𝑡, 𝜇!"#$ + 𝑛𝜇!"# 𝐸!"# 𝒓, 𝑡, 𝜇!"#$

! ! ! 𝑑𝐸!"# 𝒓, 𝑡, 𝜇!"#$ − 𝜇!"# 𝑑𝐸!"# 𝒓, 𝑡, 𝜇!"#$

𝑑𝜇!"# 𝑑𝜆

eq.  15  

𝑑𝜇!"#$   𝑑𝜆

  Up  to  this  point,  the  above  formulas  have  only  been  slightly  reorganized.  To  allow  for  an  efficient   prediction   of     at   other   λ-­‐points   (or   other   parameterizations   of   individual   µμ),   the   quantities  in  eq.  10  will  not  only  be  calculated  for  µμslj  according  to  the  simulated  λ-­‐point,  but  for  a   ! range  of  values,   e.g.   𝜇!"# = 0, 0.01, 0.02, … , 1 .  These  additional  calculations  can  be  performed  

efficiently   because   at   the   time   of   calculation   the   pairlist   has   already   been   generated   and   the   lookup  of  several  parameters  (e.g.  C12  and  C6)  does  not  have  to  be  repeated.  The  sets  of  quantities   ! ! ! ! ! ! ! ! 𝐸!" 𝒓, 𝑡, 𝜇!"#  and  𝐸!" 𝒓, 𝑡, 𝜇!"#  as   well   as  𝑑𝐸!" 𝒓, 𝑡, 𝜇!"#  and  𝑑𝐸!" 𝒓, 𝑡, 𝜇!"#  will   be   written   to   the  

energy   and   free   energy   trajectory   files,   respectively.   The   total   VLJ(r,t,λ)   and   ∂VLJ(r,t,λ)/∂λ   for   other  λ  values  (or  other  parameterizations  of  the  individual  µμ  values)  can  be  easily  reconstructed   from  these  energy  and  free  energy  trajectories,  since  the  remaining  factors  in  eq.  11  and  eq.  12   are   not   dependent   on   the   configuration   of   the   system.   A   very   similar   procedure   is   followed   for   the  Coulomb-­‐reaction  field  interactions,  by  storing  the  properties  of  eq.  13  for  different  values  of  

µμscrf.  Note  that  for  the  application  described  in  the  current  paper  not  all  energy  and  free  energy   terms  of  eq.  10  and  eq.  13  (and  the  following  terms  for  the  kinetic  and  bonded  interactions)  need   to   be   written   to   disk   separately.   However   for   future   applications   this   flexibility   will   be   very   useful,  see  Discussion  for  more  details.     Changes  of  atom  masses  as  well  as  covalent  interactions  are  easier  to  implement,  since  these  are   only   influenced   by   a   single   µμx   parameter.   This   also   means   that   the   corresponding   EX   and   dEX   quantities   do   not   have   to   be   written   out   separately   for   state   A   and   B.   The   kinetic   energy   contributions  are  determined  by  the  quantity   𝐸! 𝒑, 𝒓, 𝑡, 𝜇! = !

! 𝑚 ! !

𝜇! (𝐯! (𝑡))!  

eq.  16  

with   𝑚! 𝜇! = 1 − 𝜇! 𝑚!! + 𝜇! 𝑚!!   where  𝑚!!  is   the   mass   of   perturbed   atom   i   in   state   X,   vi   is   the   velocity   of   atom   i   and   µμK   is   the   individual   µμ   for   the   kinetic   energy   contributions.   However,   when   calculating   the   kinetic   energy  

 

 

ACS Paragon Plus Environment

7  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 27

contributions   for   other   µμK   values   as   the   simulation   is   performed   with,   one   has   to   keep   in   mind   ! that  the  momenta,  rather  than  the  velocities,  are  conserved:  𝒑 = 𝑚 𝜆! 𝐯 ! = 𝑚 𝜆!! 𝐯 ! .  Thus,  the  

predicted  kinetic  energy  contributions  for  𝜇!!  from  a  simulation  at  𝜇!!  is  calculated  by  

!

!

𝑚! 𝜇!! 𝑚! 𝜇!!

! !

𝐸! 𝒑, 𝒓, 𝑡, 𝜇!! =

eq.  17  

(𝐯! (𝑡))!  

where   the   velocities   (and   coordinates)   correspond   to  𝜇!! .   For   the   derivative   with   respect   to   µμK,   we  have  to  consider  the  kinetic  energy  as  a  function  of  the  momenta,  rather  than  of  the  velocities,   which  results  in     ! !

𝑑𝐸! 𝒑, 𝒓, 𝑡, 𝜇!! = − !

𝑚!! − 𝑚!!

𝐯! (𝑡) !  

eq.  18  

Again,   for   predictions   of   dEK   at   µμK   values   other   than   the   simulated   one,   the   velocities   need   be   scaled,  because  only  the  momenta  are  preserved:   𝑑𝐸! 𝒑, 𝒓, 𝑡, 𝜇! = − !

! !

𝑚! 𝜇!! 𝑚! 𝜇!!

𝑚!! − 𝑚!!

!

eq.  19  

𝐯! (𝑡) !  

  A  quartic  potential  energy  term  is  used  to  describe  the  covalent  bond  stretching  interaction   𝐸! 𝒓, 𝑡, 𝜇! = ! ! !

where  𝑘!

! !

! !

1 − 𝜇! 𝑘!

! !

+ 𝜇! 𝑘!

𝑏!! (𝑡) −

1 − 𝜇! 𝑏!!! + 𝜇! 𝑏!!!

! !

 

eq.  20  

 is   the   force   constant   for   perturbed   bond   n   in   state   X,   bn   is   the   current   bond   length  

and  𝑏!!!  is  the  ideal  bond  length  of  bond   n  for  state  X.  The  corresponding  derivative  with  respect   to  µμb  is   dE! 𝒓, 𝑡, 𝜇! = !

! !

! !

−4 𝑘!

! !

+ 𝜇! 𝑘!

∙ 𝑏!!! + 𝜇! 𝑏!!! − 𝑏!!! ! !

+ 𝑘!

! !

− 𝑘!

! !

− 𝑘!

𝑏!!! − 𝑏!!!

eq.  21  

𝑏!! (𝑡) − 𝑏!!! + 𝜇! 𝑏!!! − 𝑏!!!

𝑏!! (𝑡) − 𝑏!!! + 𝜇! 𝑏!!! − 𝑏!!!

! !

!

.  

  The  covalent  bond  angle  bending  interactions  are  described  by   𝐸! 𝒓, 𝑡, 𝜇! = !

! !

! !

1 − 𝜇! 𝑘!

∙ cos 𝜃! 𝑡 −

 

! !

+ 𝜇! 𝑘!

1 − 𝜇! cos 𝜃!!! + 𝜇! cos 𝜃!!!

 

ACS Paragon Plus Environment

eq.  22   !

 

8  

 

Page 9 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

! !

where  𝑘!

 is  the  bond  angle  force  constant  for  bond   n  in  state  X,   θn  is  the  current  bond  angle  

and  𝜃!!!  is  the  optimal  bond  angle  for  bond  n  in  state  X.  Its  derivative  with  respect  to  µμθ  is   𝑑𝐸! 𝒓, 𝑡, 𝜇! = !

! !

! !

−2 𝑘!

! !

+ 𝜇! 𝑘!

! !

cos 𝜃!!! − cos 𝜃!!!

− 𝑘!

eq.  23  

∙ cos 𝜃! (𝑡) − cos 𝜃!!! − 𝜇! cos 𝜃!!! − cos 𝜃!!! ! !

+ 𝑘!

! !

cos 𝜃! (𝑡) − cos 𝜃!!! − 𝜇! cos 𝜃!!! − cos 𝜃!!!

− 𝑘!

!

.  

  The  dihedral-­‐angle  torsion  interactions  are  described  by   ! !

𝐸! 𝒓, 𝑡, 𝜇! =

1 − 𝜇! 𝑘!

! !

1 + cos 𝑚!

𝜑! (𝑡) − 𝜑!!!

eq.  24  

! ! !

+ 𝜇! 𝑘! ! !

where  𝑘!

! !

1 + cos 𝑚!

𝜑! (𝑡) − 𝜑!!!   (!)!

 is  the  force  constant  of  dihedral  n  in  state  X,  𝑚!

 is  the  multiplicity  of  dihedral  n  in  

state  X,  φn  is  the  current  dihedral  angle  and  𝜑!!!  is  the  phase  shift.  We  can  define  the  quantities   𝐸!! 𝒓, 𝑡 =

! !

1 + cos 𝑚!

! !

1 + cos 𝑚!

𝑘!

! !

𝜑! (𝑡) − 𝜑!!!

! !

𝜑! (𝑡) − 𝜑!!!

!

𝐸!! 𝒓, 𝑡 =

𝑘!

eq.  25  

 

!

and  the  derivative  with  respect  to  µμφ  for  dihedral-­‐angle  torsion  interactions  is   ! !

dE! 𝒓, 𝑡 =

𝑘!

! !

1 + cos 𝑚!

𝜑! (𝑡) − 𝜑!!!

eq.  26  

! ! !

− 𝑘!

! !

1 + cos 𝑚!

𝜑! (𝑡) − 𝜑!!!

.  

This  quantity  is  independent  of  µμφ  and  thus  only  needs  to  be  calculated  once.     Similar  to  the  quantities  in  eq.  10  and  eq.  13  the   Ex  and   dEx  quantities  of  the  kinetic  and  covalent   interactions   are   written   to   the   energy   and   free   energy   trajectory   file,   respectively,   for   𝜇!! = 0, 0.01, 0.02, … , 1 .  We  then  have  for  the  kinetic  and  covalent  interactions   𝐾 𝒑, 𝒓, 𝑡, λ = 𝐸! 𝒑, 𝒓, 𝑡 𝑉! 𝒓, 𝑡, λ = 𝐸! 𝒓, 𝑡, 𝜇!   𝑉! 𝒓, 𝑡, λ = 𝐸! 𝒓, 𝑡, 𝜇! 𝑉! 𝒓, 𝑡, λ = 1 − 𝜇! 𝐸!! 𝒓, 𝑡 + 𝜇! 𝐸!! 𝒓, 𝑡

eq.  27  

and  

 

 

ACS Paragon Plus Environment

9  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 27

𝜕𝐾 𝒑, 𝒓, 𝑡, λ 𝜕𝜇! = 𝑑𝐸! 𝒑, 𝒓, 𝑡, 𝜇! 𝜕λ 𝜕λ 𝜕𝑉! 𝒓, 𝑡, λ 𝜕𝜇! = 𝑑𝐸! 𝒓, 𝑡, 𝜇! 𝜕λ 𝜕λ   𝜕𝑉! 𝒓, 𝑡, λ 𝜕𝜇! = 𝑑𝐸! 𝒓, 𝑡, 𝜇! 𝜕λ 𝜕λ 𝜕𝑉! 𝒓, 𝑡, λ 𝜕𝜇! = 𝑑𝐸! 𝒓, 𝑡, 𝜇! 𝜕λ 𝜕λ

eq.  28  

  After   determining   the   contributions   of   all   the   interaction   types   from   a   simulation   at   λS,   we   can   reconstruct   the   total   energy  𝐻 𝒓, 𝑡, λ!

!!  ,  

as   well   as   its   derivative  𝜕𝐻 𝒓, 𝑡, λ!

𝜕λ

!!  for  

any  

other  value  of  λ,  indicated  by  λP:   𝐻 𝒓, 𝑡, λ!

!!

= 𝑉!" 𝒓, 𝑡, λ!

!!

+ 𝑉!"# 𝒓, 𝑡, λ!

+ 𝑉! (𝒓, 𝑡, λ! )

!!

+ 𝐾(𝒑, 𝒓, 𝑡, λ! )

!!

+ 𝑉! (𝒓, 𝑡, λ! )

!!

!!

+ 𝑉! (𝒓, 𝑡, λ! )

!!

eq.  29  

 

and   𝜕𝐻 𝒓, 𝑡, λ! 𝜕λ

= !!

𝜕𝑉!" 𝒓, 𝑡, λ! 𝜕λ +

+ !!

𝑑𝑉! (𝒓, 𝑡, λ! ) 𝜕λ

where   we   use   the   notation  

!!  to  

!!

𝜕𝑉!"# 𝒓, 𝑡, λ! 𝜕λ

+

𝑑𝑉! (𝒓, 𝑡, λ! ) 𝜕λ

+ !!

!!

+

𝑑𝐾(𝒑, 𝒓, 𝑡, λ! ) 𝜕λ

𝑑𝑉! (𝒓, 𝑡, λ! ) 𝜕λ

eq.  30   !!

  !!

indicate   that   the   preceding   quantity   was   obtained   from   a  

simulation   at   λS.   Hence,   the   estimate   𝜕𝐻 𝒓, 𝑡, λ!

𝜕λ

!!

 has   been   calculated   from   a  

configurational  ensemble  obtained  at  λS  and  should  therefore  be  reweighted  to  get  the  ensemble   average  appropriate  for  λP:18  

𝜕𝐻 𝒓, 𝑡, λ! 𝜕λ

𝜕𝐻 𝒓, 𝑡, λ! 𝜕λ = !! !!

𝑒! !

!!

𝑒! !

!,!,!! !! !,!,!!

eq.  31  

!! ! !!

!,!,!! !! !,!,!!

!! !

 

!!

  The  focus  in  this  paper  will  be  on  the  prediction  of    at  λ  values  not  equal  to  λS,  without   different  parameterizations  of  the  individual  µμ  values.  From  the  pre-­‐computed  quantities  in  eqs.   10,   13,   17   and   19–26   for   𝜇!! = 0, 0.01, 0.02, … , 1  we   can   thus   predict     for   100   λP   values  ranging  from  0  to  1  (see  Figure  1).  The  explicit  calculation  of  both  the  Hamiltonians  and   its   λ-­‐derivative,   including   a   softcore   potential,   is   the   main   difference   compared   to   previously   described  reweighting  schemes.13,15  Both  these  approaches  do  not  include  softcore  potentials  and  

 

 

ACS Paragon Plus Environment

10  

 

Page 11 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Li  and  Yang  estimate  the  energy  difference  in  the  exponential  of  eq.  31  through  the  derivative  at   λS,  which  is  not  very  accurate  for  non-­‐linear  λ-­‐dependencies..,  The  reweighting  in  eq.  31  is  only   reliable  if  the  difference  between  λS  and  λP  is  not  too  large,  such  that  there  is  sufficient  overlap  in   the  phase  space.    

Combining  predictions   As   pointed   out   in   the   previous   section,   it   is   in   principle   possible   to   predict     for   all   λ   values  between  0  and  1  from  a  single  simulation.  However,  the  reliability  of  this  prediction  is  low   for   large   differences   between   λS   and   λP   because   the   phase   space   overlap   is   likely   to   be   insufficient.  The  obvious  solution  for  this  is  to  simulate  at  multiple  λ  values,  such  that  the  phase   space   overlap   between   neighboring   states   is   improved.   Each   simulation   can   still   predict     for   the   full   range   of   λ-­‐points   or   for   a   subset   just   up   to   the   neighboring   λ-­‐points.   The   predictions   in   the   region   λS1   <   λP   <   λS2   will   be   calculated   based   on   the   predictions   from   the   neighboring  simulations  λS1  and  λS2  through  a  linear  weighting  scheme:   𝑤!! =

λ! − λ!! λ!! − λ!!

𝑤!! =

λ! − λ!!   λ!! − λ!!

eq.  32  

where  wS1  and  wS2  are  the  weighting  factors  for  the    predictions  from  the  simulations   at  λS1  and  λS2,  respectively,   𝜕𝐻 𝒓, 𝑡, λ! 𝜕λ

= 𝑤!! !!

𝜕𝐻 𝒓, 𝑡, λ! 𝜕λ

+ 𝑤!! !! (!!! )

𝜕𝐻 𝒓, 𝑡, λ! 𝜕λ

 

eq.  33  

!! (!!! )

 

Choice  of  λS  values  

One  of  the  disadvantages  of  TI,  as  already  pointed  out  in  the  introduction,  is  that  the  shape  of  the    curve  is  not  known  a  priori.  This  makes  the  initial  choice  of  λ  values  to  simulate  rather   arbitrary  and  can  lead  to  unnecessary  simulations  in  regions  of  low  curvature.  On  the  other  hand,   in   regions   where   the   curvature   is   high,   multiple   simulations   have   to   be   performed   in   a   very   narrow  region  of  λ-­‐points  in  order  to  capture  this  shape  with  the  numerical  integration.  Typically   one  starts  TI  with  11  or  21  equidistant  λ-­‐points  and  adds  points  where  needed.     With  extended  TI,  one  can  either  use  a  similar  scheme  as  for  regular  TI  (i.e.  equidistant  points),   or  determine  which  λ-­‐point  would  be  the  best  to  simulate  next  based  on  the  predictions  from  the   previously   simulated   λ-­‐points.   In   both   approaches,   fewer   λ-­‐points   will   be   required   to   reach  

 

 

ACS Paragon Plus Environment

11  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 27

accurate  results,  because  the  predictions  at  intermediate  λ-­‐points  smooth  the    curve  for   numerical  integration.  In  an  attempt  to  optimize  the  choice  of  λ-­‐points  for  the  most  accurate  free   energy  differences,  we  suggest  the  following  scheme:   1.

Simulate  at  λ=0  and  λ=1    

2.

From  each  simulation,  predict    for  all  λ-­‐points    

3.

For  every  value  of  λP  calculate  the  absolute  difference  between  the  predictions  from   each  of  the  neighboring  λS  

4.

Weight  the  differences  by  𝑤 = λ!! − λ!! − λ!! − λ! − λ! − λ!!  

5.

Find  the  absolute  maximum  of  the  weighted  difference.  The  corresponding  λP-­‐point   is  the  next  λ-­‐point  to  simulate    

6.

Repeat  from  step  2  

The   idea   behind   this   approach   is   to   introduce   an   extra   λS   where   the   prediction   is   the   least   reliable,  i.e.  where  the  neighboring  predictions  disagree  most.  The  weighting  step  (4)  lowers  the   significance   of   differences   that   are   very   close   to   one   of   the   simulated   states,   because   the   prediction   from   the   closest   simulated   state   is   very   reliable   and   we   thus   do   not   need   an   extra   simulation   at   this   point.   Other   selection   schemes   could   be   based   on   the   variance   of   ,   where  λ-­‐points  would  be  added  ‘nearby’  the  highest  variance  λ-­‐point.19  This  requires,  however,   multiple   initial   simulations   and   the   actual   choice   of   a   ‘nearby’   λ-­‐point.   The   advantage   of   our   scheme  is  that  the  curvature  at  nonsimulated  λ-­‐points  is  captured  very  quickly  and  additional  λ-­‐ points  can  be  added  accordingly.   Whenever  non-­‐equidistant  λ-­‐points  are  mentioned  below,  these  were  determined  by  the  above-­‐ mentioned  scheme,  unless  explicitly  stated  differently.  

Methods   Three   different   alchemical   free   energy   transformations   are   performed   to   test   and   demonstrate   the  use  of  extended  TI;  (1)  Methanol  to  dummy  atoms,  (2)  phenol  to  dummy  atoms  and  (3)  p-­‐ nitrobenzamidine  to  p-­‐methoxybenzamidine  (see  Figure  2).  In  addition  to  these  transformations   in  water,  the  benzamidine  transformation  is  also  performed  with  the  molecule  bound  to  trypsin.   All  molecular  dynamics  simulations  in  this  study  have  been  performed  with  a  modified  version  of   GROMOS1120   in   combination   with   the   54a721,22   (methanol   and   phenol)   or   the   53a623  

 

 

ACS Paragon Plus Environment

12  

 

Page 13 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(benzamidine)   force   field   parameter   set.   We   first   describe   the   simulation   settings   used   for   the   model   systems   in   water.  Cubic   simulation   boxes   were   used,   together   with  800-­‐1500   SPC   water   molecules   for   the   simulations   in   solution.24   A   chloride   ion   is   included   in   the   benzamidine   simulations  to  neutralize  the  system.  The  systems  are  weakly  coupled25  to  an  external  bath  with   a   temperature   of   300   K   (298   K   for   benzamidine)   and   a   coupling   time   of   0.1   ps,   with   separate   baths   for   the   solute   and   solvent   degrees   of   freedom.   The   pressure   is   kept   constant   at   1   atm   by   isotropic  weak  coupling25  with  a  relaxation  time  of  0.5  ps  and  a  compressibility  of  7.624  x  10-­‐4  (kJ   mol-­‐1  nm-­‐3)-­‐1  for  methanol  and  phenol,  and  of  4.575  x  10-­‐4  (kJ  mol-­‐1  nm-­‐3)-­‐1  for  benzamidine.  The   center  of  mass  movement  of  the  solute  is  removed  every  1000  steps.  A  triple  range  cutoff  scheme   is  used  to  calculate  the  longe  range  interactions,  for  which  a  pairlist  is  generated  every  5th  time   step.  Interactions  within  0.8  nm  are  calculated  at  every  time  step,  according  to  the  pairlist.  The   interactions  between  0.8  and  1.4  nm  are  calculated  only  together  with  the  pairlist  update  and  are   kept  constant  in  between.  A  reaction  field  contribution26  is  added  to  the  electrostatic  interactions   and  forces  to  account  for  a  homogeneous  medium  beyond  1.4  nm,  using  a  relative  permittivity  of   61   (78.5   for   phenol),   see   eq.   5   and   eq.   7.   SHAKE27   is   used   to   constrain   all   bond   lengths   with   a   geometric  accuracy  of  1  x  10-­‐4.  The  only  exceptions  are  the  two  bonds  were  the  bond  types  are   being  modified  in  the  benzamidine  simulations.  These  bond-­‐stretching  interactions  are  evaluated   trough   the   quartic   expression   of   eq.   20.   The   softness   parameters   for   atoms   that   are   being   perturbed  are  set  to  αLJ  =  0.5  and  αCRF  =  0.5  nm2  and  the  power  dependence  of  the  λ  coupling  (n   in  eq.  4)  is  equal  to  2  for  the  methanol  simulations  and  1  for  the  other  systems.  All  simulations   are  performed  for  1  ns  per  λ-­‐point.  As  the  first  test  case,  the  methanol  simulations  are  performed   at   101   equidistantly   spaced   λ-­‐points,   to   allow   direct   comparison   of   the   predicted   values   from   extended   TI.   The   other   two   systems   are   simulated   at   21   equidistantly   spaced   λ-­‐points.   For   the   methanol   system,   the   regular   TI   data   could   directly   be   used   as   reference   data,   whereas   for   the   latter   two   systems,   the   regular   TI   curve   is   smoothened   by   extended   TI   based   on   these   21   λS-­‐ points.  The  phenol  simulations  are  repeated  five  times  with  different  initial  velocities  to  obtain   statistically  independent  runs  and  determine  the  precision  of  the  method.   The   simulations   of   the   trypsin-­‐benzamidine   complex   have   been   set   up   as   described   in   our   previous  paper.28  Initial  simulations  showed  that  benzamidine  was  moving  out  of  the  binding  site   at   certain   λ-­‐points.   Therefore,   an   additional   distance   restraint   was   introduced   between   the   CG  

 

 

ACS Paragon Plus Environment

13  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 27

atom  of  ASP171  and  the  CE  atom  of  the  benzamidine  with  a  force  constant  of  500  kJ  mol-­‐1  nm-­‐2.   Similar   to   the   benzamidine   in   water,   the   trypsin-­‐benzamidine   complex   is   simulated   at   21   equidistantly  spaced  λ-­‐points.  However,  these  simulations  are  performed  for  3  ns  each,  to  allow   more  extensive  sampling.   The  quantities  described  in  eqs.  10,  13,  17,  19  and  20-­‐23  are  calculated  for   𝜇!!  =  {0,  0.01,  0.02,   …,  1}  every  20  time  steps  and  at  the  same  time  written  to  disk.  In  contrast,  since  the  quantities   described  in  eqs.  25  and  26  are  independent  of   µμX,  only  a  single  value  needs  to  be  calculated  and   written  to  disk  at  every  20th  time  step.  After  post-­‐processing  the  data,  including  the  reweighting   steps  as  described  in  eqs.  31-­‐33  the  trapezoidal  rule  is  used  as  numerical  integration  algorithm.   For   a   fair   comparison   of   the   BAR   and   (extended)   TI,   all   methods   were   performed   based   on   all   data   points   (including   correlated   data   points)   and   bootstrapping   based   on   100   bootstrap   samples  was  used  as  uniform  error  estimate.  

Results   Panel  A  in  Figure  3  shows  the  results  of  extended  TI  for  the  methanol  test  case.  The  reference  TI   data   is   indicated   by   the   black   curve   and   integration   results   in   a   free   energy   difference   of   23.8   kJ/mol.  The  two  regions  of  high  curvature  make  it  necessary  to  simulate  at  many  λ-­‐points  when   using   regular   TI.   The   colored   curves   in   Figure   3A   show   the   extended   TI   curves,   based   on   predictions   from   2   (red),   3   (green),   5   (blue),   6   (orange)   or   11   (cyan)   equidistant   values   of   λS.   When  using  extended  TI,  the  information  of  only  the  two  endpoints  is  already  enough  to  give  a   rough  estimate  of  the  shape  of  the  curve  as  well  as  the  location  of  the  maximum  and  minimum   (red  curve).  Furthermore,  with  as  few  as  6  equidistant  points  the  predicted  curve  (orange)  can   barely  be  distinguished  from  the  reference  TI  curve.  This  also  becomes  clear  from  the  differences   in  the  ΔG  values  with  respect  to  the  reference  data,  as  shown  in  Table  1.  For  comparison  these   differences  are  also  shown  for  regular  TI  and  BAR  with  n  λ-­‐points.  Extended  TI  with  6  equidistant   points  only  differs  by  0.8  kJ/mol  from  the  reference,  whereas  this  is  still  -­‐5.9  kJ/mol  for  regular   TI.  The  deviations  of  the  BAR  results  from  the  reference  data  are  very  small  even  from  just  2  λ-­‐ points.  However,  for  BAR  to  obtain  reliable  and  precise  free  energy  differences,  sufficient  overlap   between   the   forward   and   backward   energy   differences   from   neighboring   λ-­‐points   should   be   present.   It   has   therefore   been   suggested   that   the   overlap   integral   (OI)   between   each   of   the  

 

 

ACS Paragon Plus Environment

14  

 

Page 15 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

neighboring   λ-­‐points   should   at   least   be   0.01.6,11   Analysis   of   the   OIs   shows   that   BAR   results   are   reliable  only  with  5  or  more  λ-­‐points.  Close  agreement  between  free-­‐energy  differences  may  still   be  a  fortuitous  result  of  cancelation  of  errors.  Therefore,  Table  1  also  shows  the  mean  absolute   error   (MAE),   which   is   used   to   compare   each   point   of   the   predicted   curve   with   the   reference   curve   and   thereby   prevents   cancelation   of   errors   that   can   be   present   in   the   ΔG-­‐ΔGREF   measure.   For   the   calculation   of   the   MAE   of   a   regular   TI,   which   has   fewer   data   points   than   the   reference   data,   a   linear   interpolation   is   used   for   the   intermediate  λ-­‐points,   as   appropriate   for   integration   with  the  trapezoidal  rule.  The  MAE  for  extended  TI  with  6  equidistant  points  is  1.5  kJ/mol  and  for   TI  with  the  same  λ-­‐points  16.8  kJ/mol.  For  11  points,  the  deviation  of  the  regular  TI  curve  is  still   7.0   kJ/mol.   Using   non-­‐equidistant   points   as   discussed   in   the   theory   section   improves   the   performance  of  extended  TI  even  more.  Here,  5  points  lead  to  ΔG-­‐ΔGREF  of  1.4  kJ/mol  and  a  MAE   of   1.5   kJ/mol,   whereas   for   5   equidistant   λS-­‐points   these   values   were   still   4.0   and   4.1   kJ/mol,   respectively.   The   phenol   test   system   is   expected   to   be   more   difficult   because   13   atoms   are   being   perturbed   into  dummy  atoms,  instead  of  only  3  atoms  for  methanol.  As  Figure  3B  and  Table  2  show,  this  is   especially  an  issue  when  only  a  few  λ-­‐points  are  being  used.  With  just  two  or  three  λ-­‐points,  none   of  the  methods  result  in  an  accurate  free  energy  difference.  BAR  seems  to  have  a  small  deviation   from   the   reference   free   energy   for   5   equidistant   λ-­‐points   (1.6   kJ/mol),   but   in   fact   the   overlap   integral  criterium  is  only  met  when  6  λ-­‐points  are  used,  which  then  results  in  a  deviation  of  1.5   kJ/mol.  For  extended  TI  with  equidistant  points,  the  predicted  curve  looks  reasonably  similar  to   the  reference  one  between  λ  =  0.75  and  1,  but  is  far  off  for  all  other  λ-­‐points,  especially  around   λ=0.3   (see   Figure   3B).   This   is   predominantly   caused   by   the   prediction   from   the   simulation   at   λ=1.  This  prediction  is  extremely  difficult,  because  we  try  to  calculate  the  (free)  energies  for  the   situation   where   phenol   has   real   atoms   instead   of   dummy   atoms,   even   though   that   space  is   still   taken  up  by  water  molecules.  The  solvent  and  solute  atoms  would  thus  be  clashing,  causing  very   large  energy  estimates.  With  3  equidistant  λS-­‐points,  extended  TI  is  already  doing  a  lot  better,  but   still  has  a  local  minimum  around  λ=0.6  which  is  not  present  in  the  reference  TI  curve.   Starting   from   5   equidistant   points   this   additional   local   minimum   is   no   longer   predicted   (MAE   =   4.9   kJ/mol)   and   with   6   points   the   whole   curve   is   represented   very   well   with   the   MAE   equal   to   1.7   kJ/mol.  Table  S1  in  the  supporting  information  compares  the  root-­‐mean-­‐square  deviation  from  

 

 

ACS Paragon Plus Environment

15  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 27

the  reference  data  for  different  λ-­‐intervals,  using  BAR,  extended  TI  and  TI.  As  expected,  regular   TI   cannot   handle   λ-­‐intervals   larger   than   0.1   reliably,   while   both   BAR   and   extended   TI   show   better   agreement   up   to   Δλ   =   0.3   and   0.2,   respectively.   For   some   larger   λ-­‐intervals,   no   overlap   was  observed  between  the  states,  leading  to  large  deviation  for  both  BAR  and  extended  TI.  While   BAR   can   handle   larger   λ-­‐intervals,   including   the   curvature   at   intermediate   λ-­‐values   as   in   extended  TI  significantly  enhances  the  efficiency  of  TI.   Whereas   for   methanol   non-­‐equidistant   λ-­‐points   led   to   more   accurate   results   than   equidistant   points,  the  opposite  is  true  in  the  phenol  case.   The  problem  is  that  the  choice  of  the  λ-­‐points  is   based   on   the   difference   between   the   predictions   from   neighboring   simulation   points.   Since   the   predicted   values   from   λS   =   0   and   1   are   so   different   at   low   λP,   additional   λS-­‐points   are   placed   relatively  close  together  at  low  values.  For  5  non-­‐equidistant  points,  the  proposed   simulated  λS   were  0,  0.25,  0.4,  0.55  and  1.  The  region  around  the  minimum  in  the  reference  curve  can  only  be   predicted  accurately  once  a  larger  λS    (0.8)  was  included  as  6th  point.   The  benzamidine  case  is  more  complicated  since  not  only  non-­‐bonded  but  also  covalent  changes   are   being   made.   Furthermore,   this   example   represents   an   alchemical   change   that   does   not   involve   the   removal   of   many   atoms,   but   rather   a   change   of   character   (Figure   2).   Similar   to   the   phenol   test   case,   using   just   2   λ-­‐points   is   not   enough   to   obtain   accurate   results   with   any   of   the   methods  for  benzamidine  in  water  (see  Figure  3C  and  Table  3).  However,  the  extended  TI  result   is   not   so   far   off   as   in   the   phenol   case.   The   general   shape   of   the   curve   is   now   similar   to   the   reference   data,   although   the   values   around   the   minimum   around   λ=0.3   and   the   maximum   around  λ=0.75  are  off  by  20  and  10  kJ  mol-­‐1,  respectively.    Using  3  equidistant  λ-­‐points  results  in   an   accurate   free   energy   difference   (error   of   0.4   kJ/mol)   although   there   is   some   cancellation   of   errors   as   shown   by   the   MAE   of   1.7   kJ/mol.   From   5   points   onwards   also   the   MAE   is   low,   both   using   equidistant   as   well   as   non-­‐equidistant  λ-­‐points.   The   overlap   integral   criterium   for   BAR   is   reached  with  5  λ-­‐points  which  results  in  an  error  of  0.6  kJ/mol  with  respect  to  the  reference  data.   An  additional  challenge  is  to  perform  the  same  transformation  of  benzamidine  when  it  is  bound   to   the   protein   trypsin.   Similar   to   the   results   for   benzamidine   in   water,   simulations   at   two   λ-­‐ points  are  not  sufficient  to  obtain  accurate  results,  but  the  general  shape  of  the  curve  is  captured   quite  well  (Figure  3D).  Using  data  from  three  simulated  λ-­‐points  greatly  improve  the  results  for   the   region   up   to   λ=0.5.   In   contrast   to   the   benzamidine   in   water,   the   predicted   curve   does   not  

 

 

ACS Paragon Plus Environment

16  

 

Page 17 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

match  the  reference  data  so  well  for  λ  >  0.5.  This  also  becomes  clear  when  comparing  the  MAEs   for   the   simulations   in   water   and   when   bound   to   trypsin,   as   shown   in   Table   3   and   Table   4,   respectively.  In  water,  the  MAE  was  1.7  kJ/mol  for  3  λ-­‐points  whereas  this  is  4.1  kJ/mol  for  the   benzamidine  bound  to  trypsin.  For  BAR,  the  overlap  integral  criterium  is  met  with  at  least  5  λ-­‐ points,  leading  to  an  error  of  -­‐1.3  kJ/mol.  As  expected,  the  transformations  of  a  molecule  bound   to  a  protein  are  more  difficult  to  predict  from  fewer   λ-­‐points  than  the  same  transformations  in   solution.   However,   with   as   few   as   6   equidistant   λ-­‐points   the   MAE   is   down   to   1.1   kJ/mol.   The   scheme  to  automatically  determine  the  next  λ-­‐point  to  simulate  actually  resulted  in  equidistant   points  up  to  5  λ-­‐points.  Using  6  non-­‐equidistant  λ-­‐points  gives  a  slightly  higher  MAE  (2.0  kJ/mol)   than  the  equidistant  scheme  (1.1  kJ/mol).     Overall,   extended   TI   is   giving   very   good   results   using   only   6   equidistant   λ-­‐points   (for   benzamdine   in   water   even   with   5   λ-­‐points),   whereas   regular   TI   usually   requires   around   21   λ-­‐ points   to   obtain   the   same   level   of   accuracy.   Hereby,   extended   TI   comes   very   close   to   the   possibilities   of   BAR   which   also   required   5   or   6   λ-­‐points   to   get   reliable   results   for   our   test   systems.   While   we   do   not   claim   that   extended   TI   is   more   accurate   than   BAR,   one   of   the   advantages  of  extended  TI  over  BAR  is  that  no  additional  post-­‐analysis  on  coordinate  trajectories   is   required   when   simulations   at   intermediate   λ-­‐points   are   needed.   A   further   advantage   of   the   current  implementation  of  extended  TI  in  the  GROMOS  code  is  moreover,  the  possibility  to  write   out  additional  energy  and  free  energy  terms  which  not  only  allows  the  performance  of  extended   TI  but  also  much  faster  post-­‐analysis  for  application  of  BAR.       So  far  we  have  only  discussed  the  accuracy  of  extended  TI,  but  the  precision  is  important  just  as   well.   To   gain   more   insight   in   the   latter,   we   have   repeated   the   phenol   simulations   5   times   with   different  initial  velocities.  Additionally,  for  each  of  these  5  simulations  we  test  the  robustness  of   extended  TI  by  reducing  the  amount  of  data  included  in  the  analysis.  We  use  the  full  1  ns,  just  the   first  500  ps,  or  data  collected  at  80  fs  (every  second  stored  energy)  and  200  fs  (every  fifth  stored   energy)  intervals.    The  MAEs  averaged  over  the  5  repeats  as  well  as  the  standard  deviations  over   the  repeats  are  shown  in  Table  5  and  Table  6.  As  was  shown  above  for  the  single  simulation  of   phenol   using   just   the   endpoints   leads   to   very   large   errors,   even   when   using   all   data   (1   ns   and  

 

 

ACS Paragon Plus Environment

17  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 27

using  every  frame).  The  average  MAE  over  the  5  repeats  is  around  3  x  102  kJ/mol  with  a  standard   deviation   of   2   x   102   kJ/mol.   Including   λ=0.5   greatly   improves   the   average   MAE   and   also   the   standard   deviation   is   much   lower   with   a   value   of   2.9   kJ/mol   (see   Table   5).   Using   only   the   first   half  of  the  simulation  causes  an  increase  of  the  average  MAE  of  1  x  102  kJ/mol  for  2  λ-­‐points.  For   3   equidistant   points   this   is   difference   is   smaller   but   still   around   4   kJ/mol.   In   both   cases   the   standard   deviation   was   also   slightly   larger   for   the   predictions   based   on   500   ps   of   simulations.   From  5  equidistant  points  onwards  the  differences  between  the  predictions  based  on  half  or  the   full   data   are   not   so   pronounced   any   more.   Using   every   second   frame   from   the   simulation   data   results  in  the  same  number  of  frames  as  using  the  first  500  ps,  but  the  results  are  rather  different.   In   contrast   to   the   results   from   500   ps,   the   average   MAEs   for   every   second   frame   are   hardly   different   from   the   results   of   the   full   data.   Using   every   fifth   frame,   which   corresponds   to   using   data   from   every   100th   time-­‐step   or   every   0.2   ps,   again   causes   higher   average   MAEs   when   only   few   λ-­‐points   are   used,   although   the   standard   deviation   is   lower.   With   increasing   number   of   λ-­‐ points,   the   results   become   similar   to   the   full   data   results.   Overall,   if   5   or   more   equidistant   λ-­‐ points  are  used  the  standard  deviation  is  always  ≤  1  kJ/mol,  independent  of  which  frames  of  the   simulations  were  used.     Whether  results  will  be  influenced  by  reducing  the  amount  of  data  included  in  the  analysis  will   depend   on   the   correlation   times   of   the   quantities   in   question.   In   the   case   of   TI,   practically   all   correlation  times  of  ∂H/∂λ  are  larger  than  0.2  ps  (which  equals  every  5th  frame).  This  indicates   that   the   TI   results   will   not   change   much   upon   including   only   every   5th   frame   in   the   analysis   (Table  5).  The  correlation  time  of  extended  TI  is  more  complicated  to  determine  because  of  the   reweighting   in   eq.   31   and   because   from   every   simulation,   100   predictions   are   made   which   all   have   their   own   correlation   times.   For   simplicity,   we   only   evaluate   the   correlation   time   of   the   numerator   in   eq.   31,   which   should   be   lower   than   that   of   the   denominator   because   of   the   multiplication  with  ∂H/∂λ.  The  vast  majority  of  the  evaluated  correlation  times  are  smaller  than   0.2   ps   and   it   can   be   observed   that   larger   differences   between   λP   and   λS   tend   to   give   smaller   correlation  times.    This  is  also  reflected  in  the  results  of  Table  5,  where  the  differences  in  MAE   upon  reducing  the  data  to  every  5th  frame  are  larger  when  fewer  λ-­‐points  are  used.   Interestingly,  the  accuracy  of  TI  improves  more  from  including  more  λ-­‐points  than  from  longer   simulations,  whereas  this  is  not  necessarily  the  case  for  extended  TI.  Considering  a  constant  total  

 

 

ACS Paragon Plus Environment

18  

 

Page 19 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

simulation  time  of  around  11  ns  for  each  of  the  five  repeats  of  the  phenol  simulations,  either  11   points  of  each  1  ns  or  21  λ-­‐points  of  each  0.5  ns  can  be  used.  TI  improves  from  an  average  MAE  of   9.7   to   2.5   kJ/mol   when   including   more   λ-­‐points,   whereas   the   results   for   extended   TI   hardly   change  with  average  MAEs  of  1.0  and  0.8  kJ/mol,  respectively  (not  all  data  shown).     As   discussed   above,   the   scheme   to   determine   the   non-­‐equidistant   λ-­‐points   to   simulate   (see   theory  section)  can  give  non-­‐optimal  results  when  the  perturbation  is  large.  This  is  also  shown  in   Table  6  for  the  5  repeats  of  the  phenol  system.  Using  three  non-­‐equidistant  λ-­‐points  gives  much   larger   average   MAEs   than   the   equidistant   points   and   in   particular   the   standard   deviations   are   also  very  large.  Small  differences  in  the  simulated  end  points  can  determine  a  different  λ-­‐point  to   simulate  next  and  this  is  what  causes  the  large  differences  in  the  MAEs.  If  the  same  (albeit  non-­‐ equidistant)   points   would   be   used   among   the   repeats,   the   standard   deviation   would   be   much   smaller  (data  not  shown).  Similar  to  the  equidistant  points,  there  is  not  much  difference  between   using   the   full   data   or   every   second   frame.   The   first   half   of   the   simulation   gives   higher   average   MAEs  than  the  full  simulation,  which  is  mostly  caused  by  a  single  repeat  with  a  very  large  MAE.   Using   every   fifth   frame   gives   even   higher   average   MAEs,   mostly   because   there   are   multiple   repeats  with  large  errors.  This  is  predominantly  caused  by  the  choice  of  λS-­‐points.  For  example,   using   6   non-­‐equidistant   points   the   average   MAE   equals   13   ±   9   kJ/mol,   whereas   this   would   be   only  5  ±  4  kJ/mol  if  the  same  λS-­‐points  were  used  as  determined  from  all  data  (data  not  shown).  

Discussion   The  results  in  Table  6  showed  that  the  scheme  to  determine  the  next  λ  point  to  simulate  is  not   very  reliable  for  large  perturbations.  However,  the  scheme  can  be  modified  such  that  at  step  1,   we  simulate  e.g.  3  equidistant  points  instead  of  only  the  end  points  and  then  continue  at  step  2  as   indicated.  This  greatly  improves  the  results  for  the  phenol  simulations  as  shown  in  Table  S2.   In  principle  one  can  start  with  any  choice  of  λ-­‐points,  equidistant  or  not,  and  then  use  the  scheme   to  find  the  λ  points  that  would  be  the  best  to  add  to  further  refine  the  curve.  This  is  probably  one   of  the  most  efficient  strategies,  because  to  determine  the  next  λ-­‐point  to  simulate  with  the  above-­‐ mentioned  scheme,  the  simulation  at  the  previous  λ-­‐point  must  have  finished.  On  the  other  hand,   when  equidistant  points  are  used  (or  any  other  set  of  predefined  λ-­‐points),  the  simulations  can   be  performed  in  parallel.  

 

 

ACS Paragon Plus Environment

19  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 27

The  results  in  Table  5  and  6  show  that  (at  least  for  this  system)  it  might  not  be  necessary  to  write   out  the  energies  and  free  energies  every  20  time  steps,  but  this  can  be  reduced  to  at  least  every   40  time  steps.  This  would  reduce  both  the  run  time  of  the  simulations  as  well  as  the  required  disk   space.  Additional  reduction  of  resources  can  be  achieved  when  the  energy  contributions  are  not   calculated  for  the  whole  range  of  λ-­‐points.  Especially  when  using  equidistant  λS,  this  can  be  easily   achieved   because   the   predictions   only   need   to   be   made   up   to   the   neighboring   λS-­‐points.   For   example,  if  6  equidistant  λ-­‐points  are  simulated,  then  for  λS  =  0.2  predictions  are  only  required   for  λP  ranging  between  0.0  and  0.4.     The  additional  simulation  time  due  to  the  calculations  for  extended  TI  is  determined  by  a  number   of   factors.   These   include   how   often   the   additional   calculations   are   performed   (every   X   timesteps),  the  number  of  λP  for  which  the  interactions  are  determined,  the  number  of  perturbed   atoms  and  the  total  size  of  the  system.  We  used  the  benzamidine-­‐in-­‐water  system  as  a  test  case,   where  the  additional  calculations  are  performed  every  20  time  steps  for  101  λP-­‐points  and  four   atoms,   two   bond   lengths,   one   angle   and   one   dihedral   angle   are   perturbed.   The   total   number   of   atoms   in   the   system   was   2467   and   on   a   single   cpu,   extended   TI   takes   around   10%   more   simulation   time   than   a   regular   TI   simulation.   For   larger   systems   this   percentage   would   be   significantly   lower,   because   the   percentage   of   interactions   involving   perturbed   atoms   is   much   smaller   than   in   a   small   system.   Extended   TI   thus   results   in   a   69%   gain   of   efficiency   since   only   around  6  λ-­‐points  need  to  be  simulated  instead  of  the  most  commonly  used  21  λ-­‐points  with  an   additional  cost  of  only  10%  or  less  per  λ-­‐point.     The   above-­‐described   application   of   predicting   the   TI   curve   from   fewer   λ-­‐points   is   not   the   only   possible   application   of   the   current   code.   The   energy   terms   of   eqs.   10,   13,   17   and   19-­‐26   are   written  out  separately  because  this  leaves  much  more  flexibility  in  terms  of  post-­‐analysis.  Up  to   now   we   have   only   predicted   for   other   λP-­‐points   than   the   λS-­‐points   at   which   simulations   were   performed  with,  but  in  principle  predictions  can  be  made  for  changes  in  any  of  the  parameters   that  influence  the  free  energy  pathway,  as  long  as  the  intrinsic  outcome  of  the  calculation  does   not  change.  Systematically  searching  this  parameter  space  can  lead  to  insight  in  the  most  efficient   pathway   and   thereby   further   improve   the   efficiency   of   TI.   Naden   et   al.   have   also   explored   the   parameter  space  to  find  the  optimal  path,  by  making  use  of  linear  combinations  of  λ-­‐independent  

 

 

ACS Paragon Plus Environment

20  

 

Page 21 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

basis   functions   describing   the   non-­‐bonded   interactions.19,29   However,   their   approach   circumvents   the   optimization   of   softcore   parameters.   In   our   current   approach,   parameter   changes   can   e.g.   involve   separate   changes   of   parameterization   of   the   individual   µμx   values   (including  softness  parameters)  according  to  eq.  3  or  of  the  power  dependency  of   µμ,   n.  As  these   parameters   can   have   a   strong   influence   on   the   shape   of   the   TI   curve,9,29   they   directly   influence   the  efficiency  of  the  TI  because  high  curvature  regions  require  closely  spaced  λ-­‐points  in  order  to   integrate  accurately.  When  using  BAR  one  could  explore  the  influence  of  simulation  parameters   on   efficiency   by   using   non-­‐Boltzmann   Bennett,   but   this   would   require   the   expensive   re-­‐ evaluation  of  the  coordinate  trajectories.30  The  current  implementation  of  extended  TI  allows  the   reconstruction  of  the  total  energies  for  each  of  the  parameter  sets  from  the  pre-­‐computed  values   from  the  simulations  of  a  single  parameter  set  and  this  should  therefore  be  extremely  fast.  These   possibilities  will  be  explored  in  future  work.  

Supporting  information   The  Supporting  Information  is  available  free  of  charge  on  the  ACS  Publications  website.   -­‐  Root  mean  square  deviations  over  all  possible  λ-­‐intervals  comparing  BAR,  extended  TI  and  TI   for  the  phenol  system   -­‐  Average  MAE  values  for  different  initial  λ-­‐points  for  the  non-­‐equidistant  λ-­‐points  scheme.    

Acknowledgements   The  authors  thank  Drazen  Petrov  for  the  initial  setup  of  the  phenol  system.  Financial  support  by   the   Vienna   Science   and   Technology   Fund   (WWTF;   grant   number   LS08-­‐QM03)   is   gratefully   acknowledged.      

 

 

 

ACS Paragon Plus Environment

21  

 

Journal of Chemical Theory and Computation

Figures    

λ

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 27

1

0 λ

 

Figure  1.  Schematic  overview  of  extended  TI.  The  red  dots  represent  λ-­‐values  that  are  explicitly  simulated   and   for   which     is   obtained.   The   resulting   TI   curve   is   indicated   by   the   red   line.   By   predicting   𝜕𝐻/𝜕λ !!  at  intermediate,  non-­‐simulated  λ-­‐values,  the  black  curve  may  be  obtained.  

 

  Figure  2.  The  three  model  systems  used  in  this  paper;  methanol  to  dummy  atoms  (top),  phenol  to  dummy   atoms  (middle)  and  p-­‐nitrobenzamidine  to  p-­‐methoxybenzamidine  (bottom).  D  indicates  a  non-­‐interacting   dummy  atom.  

 

 

 

ACS Paragon Plus Environment

22  

 

Page 23 of 27

A 200

100

0

B 200 100

[kJ/mol]

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

0 -100 C

-100

-120

-140

-160 D

-100

-120

-140

-160 0

0.2

0.4

0.6

0.8

1

λ

 

Figure   3.   Results   for   extended   TI   using   equidistant   points   for   methanol   (A),   phenol   (B),   benzamidine   in   water   (C)   and   benzamidine   bound   to   trypsin   (D).   Different   colors   show   the   number   of   λ   points   used   for   extended   TI:   red:   2   points,   green:   3   points,   blue:   5   points,   orange:   6   points   and   cyan:   11   points.   The   thick   black  curve  with  error  bars  shows  the  data  from  regular  TI.    

           

 

 

ACS Paragon Plus Environment

23  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 27

Table   1.   Deviations   from   the   reference   data   using   different   numbers   of   simulations   for   methanol   (reference:   ΔGREF   =   23.8   ±   0.0   kJ/mol).   The   error   estimates   given   on   the   free   energy   differences   with   respect   to   the   reference  data  are  based  on  bootstrapping  analysis  and  do  not  include  the  error  estimate  of  the  reference   data.     MAE   extTI TI   extTI extTI (non.eq)   (eq)   (non.eq)   2   59.0  ±  0.1   -­‐1.3  ±  2.8   9.1  ±  0.3   9.1  ±  0.3     70.5   10.2   10.2   3   13.8  ±  0.1   -­‐0.2  ±  0.3   8.6  ±  0.2   3.9  ±  0.2     28.5   9.2   5.1   5   -­‐7.9  ±  0.1   -­‐0.6  ±  0.2   4.0  ±  0.3   1.4  ±  0.1     15.9   4.1   1.5   6   -­‐5.9  ±  0.1   -­‐0.1  ±  0.1   0.8  ±  0.1   1.0  ±  0.1     16.8   1.5   1.1   11   -­‐3.2  ±  0.1   0.1  ±  0.1   0.1  ±  0.0   0.6  ±  0.0     7.0   0.8   0.7   n  refers  to  the  number  of  simulated  λ-­‐points.  MAE  is  the  mean  absolute  error  with  respect  to  the  reference   TI   curve.   extTI   (eq)   and   extTI   (non.eq)   refer   to   the   results   of   extended   TI   with   equidistant   and   non-­‐ equidistant  points,  respectively.   n  

TI  

ΔG-ΔGREF BAR   extTI (eq)  

  Table   2.   Deviations   from   the   reference   data   using   different   numbers   of   simulations   for   phenol   (reference:   ΔGREF   =   68.0   ±   0.1   kJ/mol).   The   error   estimates   given   on   the   free   energy   differences   with   respect   to   the   reference  data  are  based  on  bootstrapping  analysis  and  do  not  include  the  error  estimate  of  the  reference   data.   MAE   extTI extTI (eq)   (non.eq)   2 2   22.5  ±  0.1   (-­‐5  ±  1)  x  10   -­‐243  ±  8   -­‐243  ±  8     53.8   251.4   251.4   3   11.3  ±  0.1   -­‐24.3  ±  3.8   3.1  ±  0.4   -­‐72.1  ±  2.0     47.2   23.2   86.3   5   -­‐2.5  ±  0.1   1.6  ±  0.2   3.9  ±  0.2   5.5  ±  0.3     34.7   4.9   15.9   6   -­‐12.0  ±  0.1   1.5  ±  0.1   1.6  ±  0.2   1.7  ±  0.1     25.1   1.7   2.7   11   -­‐2.7  ±  0.1   0.5  ±  0.1   0.4  ±  0.0   0.5  ±  0.0     9.8   0.5   0.5   n  refers  to  the  number  of  simulated  λ-­‐points.  MAE  is  the  mean  absolute  error  with  respect  to  the  reference   TI   curve.   extTI   (eq)   and   extTI   (non.eq)   refer   to   the   results   of   extended   TI   with   equidistant   and   non-­‐ equidistant  points,  respectively.   n  

TI  

ΔG-ΔGREF BAR  

extTI (eq)  

extTI (non.eq)  

TI  

  Table  3.  Deviations  from  the  reference  data  using  different  numbers  of  simulations  for  benzamidine  in  water   (reference:   ΔGREF   =   -­‐131.8   ±   0.0   kJ/mol).   The   error   estimates   given   on   the   free   energy   differences   with   respect  to  the  reference  data  are  based  on  bootstrapping  analysis  and  do  not  include  the  error  estimate  of   the  reference  data.   MAE   extTI TI   extTI extTI (non.eq)   (eq)   (non.eq)   2   24.3  ±  0.1   2.2  ±  3.6   -­‐1.7  ±  0.5   -­‐1.7  ±  0.5     25.5   9.9   9.9   3   8.6  ±  0.2   1.1  ±  0.4   0.4  ±  0.2   0.7  ±  0.2     12.7   1.7   2.6   5   1.4  ±  0.1   0.6  ±  0.1   0.0  ±  0.1   -­‐0.7  ±  0.1     7.8   0.6   1.3   6   0.8  ±  0.1   0.1  ±  0.1   -­‐0.6  ±  0.0   -­‐0.4  ±  0.0     7.0   0.7   0.9   11   0.4  ±  0.1   -­‐0.1  ±  0.1   0.0  ±  0.0   -­‐0.1  ±  0.0     2.1   0.2   0.3   n  refers  to  the  number  of  simulated  λ-­‐points.  MAE  is  the  mean  absolute  error  with  respect  to  the  reference   TI   curve.   extTI   (eq)   and   extTI   (non.eq)   refer   to   the   results   of   extended   TI   with   equidistant   and   non-­‐ equidistant  points,  respectively.       n  

 

TI  

ΔG-ΔGREF BAR   extTI (eq)  

 

ACS Paragon Plus Environment

24  

 

Page 25 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table  4.  Deviations  from  the  reference  data  using  different  numbers  of  simulations  for  benzamidine  when   bound   to   trypsin   (reference:   ΔGREF   =   -­‐124.1   ±   0.0   kJ/mol).   The   error   estimates   given   on   the   free   energy   differences   with   respect   to   the   reference   data   are   based   on   bootstrapping   analysis   and   do   not   include   the   error  estimate  of  the  reference  data.   MAE   extTI extTI (eq)   (non.eq)   2   14.8  ±  0.1   -­‐7.9  ±  1.8   -­‐8.5  ±  0.3   -­‐8.5  ±  0.3     15.8   9.0   9.0   3   4.3  ±  0.1   -­‐3.5  ±  0.4   -­‐3.4  ±  0.1   -­‐3.4  ±  0.1     12.8   4.1   4.1   5   -­‐0.5  ±  0.1   -­‐1.3  ±  0.1   -­‐1.9  ±  0.1   -­‐1.9  ±  0.1     6.0   2.0   2.0   6   1.3  ±  0.0   0.3  ±  0.0   0.2  ±  0.0   -­‐2.0  ±  0.0     5.7   1.1   2.0   11   0.5  ±  0.0   0.1  ±  0.0   0.1  ±  0.0   -­‐0.4  ±  0.0     2.0   1.0   0.8   n  refers  to  the  number  of  simulated  λ-­‐points.  MAE  is  the  mean  absolute  error  with  respect  to  the  reference   TI   curve.   extTI   (eq)   and   extTI   (non.eq)   refer   to   the   results   of   extended   TI   with   equidistant   and   non-­‐ equidistant  points,  respectively.   n  

TI  

ΔG-ΔGREF BAR  

extTI (eq)  

extTI (non.eq)  

TI  

  Table  5.  Average  and  standard  deviation  of  the  MAE  over  5  repeats  of  the  phenol  simulations,  while  using  all   or  reduced  amount  of  data.  2,  3,  5,6  and  11  equidistant  spaced  λ-­‐points  are  used.     1.0 ns 0.5 ns nd every 2 th every 5

2 2   (3  ±  2)  x  10 2   (4  ±  2)  x  10 2   (3  ±  2)  x  10 2   (6  ±  1)  x  10

3 22  ±  3   26  ±  4   23  ±  3   24  ±  2  

5 4.5  ±  0.7   4.9  ±  1.0   4.6  ±  0.8   5.0  ±  0.9  

6 2.1  ±  0.6   2.4  ±  0.6   2.4  ±  0.7   2.6  ±  0.4  

11 1.0  ±  0.3   1.2  ±  0.5   1.1  ±  0.3   1.1  ±  0.3  

  Table  6.  Average  and  standard  deviation  of  the  MAE  over  5  repeats  of  the  phenol  simulations,  while  using  all   or  reduced  amount  of  data.  2,  3,  5,6  and  11  non-­‐equidistant  spaced  λ-­‐points  are  used.     1.0 ns 0.5 ns nd every 2 th every 5

 

2 2 (3  ±  2)  x  10   2 (4  ±  2)  x  10   2 (3  ±  2)  x  10   2 (6  ±  1)  x  10  

3 1   (10  ±  6)  x  10 1   (12  ±  7)  x  10 1   (10  ±  6)  x  10 1   (17  ±  7)  x  10

5 17  ±  10   17  ±      2   14  ±      7   27  ±  13  

 

6 5  ±  4   6  ±  4   5  ±  4   13  ±  9  

ACS Paragon Plus Environment

11 0.7  ±  0.3   1.2  ±  0.3   0.8  ±  0.3   0.7  ±  0.4  

25  

 

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 27

 

References   (1)     (2)     (3)     (4)     (5)     (6)     (7)     (8)     (9)     (10)     (11)     (12)     (13)     (14)     (15)     (16)     (17)     (18)     (19)     (20)     (21)     (22)     (23)     (24)     (25)     (26)     (27)     (28)     (29)     (30)      

 

 

Shirts,  M.  R.;  Mobley,  D.  L.;  Chodera,  J.  D.  In  Annual  Reports  in  Computational  Chemistry;   D.C.  Spellmeyer  and  R.  Wheeler,  Ed.;  Elsevier,  2007;  Vol.  Volume  3,  pp  41–59.   Shirts,  M.  R.;  Mobley,  D.  L.;  Brown,  S.  P.  In  Drug  Design:  Structure-­‐  and  Ligand-­‐Based   Approaches;  Merz,  K.  M.,  Ringe,  D.,  Reynolds,  C.  H.,  Eds.;  Cambridge  University  Press:  New   York,  NY,  2010;  pp  61–86.   Mobley,  D.  L.;  Klimovich,  P.  V.  J.  Chem.  Phys.  2012,  137  (23),  230901.   Christ,  C.  D.;  Fox,  T.  J.  Chem.  Inf.  Model.  2014,  54  (1),  108–120.   Kirkwood,  J.  G.  J.  Chem.  Phys.  1935,  3  (5),  300.   Bennett,  C.  H.  J.  Comput.  Phys.  1976,  22  (2),  245–268.   Shirts,  M.  R.;  Pande,  V.  S.  J.  Chem.  Phys.  2005,  122  (14),  144107–144116.   Pham,  T.  T.;  Shirts,  M.  R.  J.  Chem.  Phys.  2011,  135  (3),  034114.   de  Ruiter,  A.;  Boresch,  S.;  Oostenbrink,  C.  J.  Comput.  Chem.  2013,  34  (12),  1024–1034.   Shirts,  M.  R.;  Chodera,  J.  D.  J.  Chem.  Phys.  2008,  129  (12),  124105–124110.   Bruckner,  S.;  Boresch,  S.  J.  Comput.  Chem.  2011,  32  (7),  1303–1319.   Paliwal,  H.;  Shirts,  M.  R.  J.  Chem.  Theory  Comput.  2011,  7  (12),  4115–4134.   Tidor,  B.;  Karplus,  M.  Biochemistry  1991,  30  (13),  3217–3228.   Beutler,  T.  C.;  Mark,  A.  E.;  van  Schaik,  R.  C.;  Gerber,  P.  R.;  van  Gunsteren,  W.  F.  Chem.  Phys.   Lett.  1994,  222  (6),  529–539.   Li,  H.;  Yang,  W.  Chem.  Phys.  Lett.  2007,  440  (1–3),  155–159.   Riniker,  S.;  Christ,  C.  D.;  Hansen,  H.  S.;  Hünenberger,  P.  H.;  Oostenbrink,  C.;  Steiner,  D.;  van   Gunsteren,  W.  F.  J.  Phys.  Chem.  B  2011,  115  (46),  13570–13577.   Hritz,  J.;  Oostenbrink,  C.  J.  Chem.  Phys.  2008,  128  (14),  144121.   Torrie,  G.  M.;  Valleau,  J.  P.  J.  Comput.  Phys.  1977,  23  (2),  187–199.   Naden,  L.  N.;  Shirts,  M.  R.  J.  Chem.  Theory  Comput.  2015,  11  (6),  2536–2549.   Schmid,  N.;  Christ,  C.  D.;  Christen,  M.;  Eichenberger,  A.  P.;  van  Gunsteren,  W.  F.  Comput.   Phys.  Commun.  2012,  183  (4),  890–903.   Reif,  M.  M.;  Hünenberger,  P.  H.;  Oostenbrink,  C.  J.  Chem.  Theory  Comput.  2012,  8  (10),   3705–3723.   Reif,  M.  M.;  Winger,  M.;  Oostenbrink,  C.  J.  Chem.  Theory  Comput.  2013,  9  (2),  1247–1264.   Oostenbrink,  C.;  Villa,  A.;  Mark,  A.  E.;  Van  Gunsteren,  W.  F.  J.  Comput.  Chem.  2004,  25  (13),   1656–1676.   Berendsen,  H.  J.  C.;  Postma,  J.  P.  M.;  Van  Gunsteren,  W.  F.;  Hermans,  J.  In  Intermolecular   Forces;  Reidel:  Dordrecht,  The  Netherlands,  1981;  Vol.  11,  pp  331–342.   Berendsen,  H.  J.  C.;  Postma,  J.  P.  M.;  van  Gunsteren,  W.  F.;  DiNola,  A.;  Haak,  J.  R.  J.  Chem.   Phys.  1984,  81  (8),  3684–3690.   Tironi,  I.  G.;  Sperb,  R.;  Smith,  P.  E.;  van  Gunsteren,  W.  F.  J.  Chem.  Phys.  1995,  102  (13),   5451–5459.   Ryckaert,  J.-­‐P.;  Ciccotti,  G.;  Berendsen,  H.  J.  C.  J.  Comput.  Phys.  1977,  23  (3),  327–341.   de  Ruiter,  A.;  Oostenbrink,  C.  J.  Chem.  Theory  Comput.  2012,  3686–3695.   Naden,  L.  N.;  Pham,  T.  T.;  Shirts,  M.  R.  J.  Chem.  Theory  Comput.  2014,  10  (3),  1128–1149.   König,  G.;  Boresch,  S.  J.  Comput.  Chem.  2011,  32  (6),  1082–1090.  

 

 

ACS Paragon Plus Environment

26  

 

Page 27 of 27

For  Table  of  Contents  Only    

λ

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

0

 

λ

1

 

 

ACS Paragon Plus Environment

27