Large-Scale Functional Group Symmetry Adapted Perturbation Theory

Jan 18, 2018 - How to Succeed at Scientific Collaboration | Part 1: A Cross-Disciplinary Collaboration. Late last year @ACS4Authors surveyed over 1,50...
1 downloads 10 Views 8MB Size
Subscriber access provided by READING UNIV

Article

Large-Scale Functional Group Symmetry Adapted Perturbation Theory on Graphical Processing Units Robert M. Parrish, Keiran C. Thompson, and Todd J. Martínez J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b01053 • Publication Date (Web): 18 Jan 2018 Downloaded from http://pubs.acs.org on January 18, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Large-Scale Functional Group Symmetry Adapted Perturbation Theory on Graphical Processing Units Robert M. Parrish,†,‡ Keiran C. Thompson,†,‡ and Todd J. Mart´ınez∗,†,‡ † Department of Chemistry and the PULSE Institute, Stanford University, Stanford, CA 94305 ‡ SLAC National Accelerator Laboratory, Menlo Park, CA 94025 Received January 14, 2018; E-mail: [email protected]

Abstract: Symmetry-adapted perturbation theory (SAPT) is a valuable method for analyzing intermolecular interactions. The functional group SAPT partition (F-SAPT) has been introduced to provide additional insight into the origins of noncovalent interactions. Until now, SAPT analysis has been too costly for large ligand-protein complexes where it could provide key insights for chemical modifications that might improve ligand binding. In this paper, we present a large-scale implementation of a variant of F-SAPT. Two pragmatic choices are made from the outset to render the problem tractable: (1) ab initio computation of dispersion and exchange-dispersion is replaced with Grimme’s empirical dispersion correction and (2) basis sets with augmented functions are avoided to allow for efficient integral screening. These choices allow the F-SAPT analysis to be written largely in terms of Coulomb and exchange matrix builds which have been implemented efficiently on graphical processing units (GPUs). Our formulation of F-SAPT is routinely applicable to molecules with well over 3000 atoms and 25000 basis functions, and is particularly optimized for the case where one monomer is significantly larger than the other. This is demonstrated explicitly with results from F-SAPT analysis of the full indinavir @ HIV-II protease complex (PDB ID 1HSG) in a polarized double zeta basis.

Introduction: A detailed quantification of intermolecular interactions is critical to the design of myriad chemical processes such as drug-protein binding, 1–11 organic catalystsubstrate interactions, 12–15 molecular crystal packing, 16–24 and many others. New computational tools are required to efficiently provide insight for large-scale intermolecular interactions involving > 1000 atoms. Much progress has been obtained in the past few decades in the efficient and accurate ab initio computation of intermolecular interaction energies. While the most obvious approach to the interaction energy is the simple “supermolecular” approach, a compelling alternative is symmetry-adapted perturbation theory (SAPT), 25–27 which directly seeks the interaction energy by perturbation series expansion of the interaction Hamiltonian coupled with explicit antisymmetrization of the dimer wavefunction at each order in the perturbation series. SAPT provides an intuitive breakdown of the interaction energy into electrostatic, exchange, induction, and dispersion components - these emerge as the first few terms in the SAPT perturbation series. Numerous other “energy decomposition analysis” methods 28–30 exist, with similar goals as SAPT. SAPT has been developed using a wide variety of electronic structure theories, including simple Hartree-Fock description of the monomers [SAPT0 or SAPT(HF)], 27

density functional theory description of the monomers [DFT-SAPT or SAPT(DFT)], 31–35 high-order many-body perturbation theory 27,36,37 or coupled-cluster theory 38 [e.g., SAPT2+(CCD)], and high-order response theory [SAPT(CCSD) and related methods]. 39–43 The implementation of SAPT methodology with the density fitting approximation 44–51 and/or natural orbital truncation schemes has enabled SAPT0 and SAPT(DFT) computations on molecules with up to 300 atoms and 5000 basis functions, 34,35,52–54 SAPT2+(CCD) computations on molecules with up to 30 atoms and 1100 basis functions, 55–57 and SAPT(CCSD) computations on molecules with up to 10 atoms and a few hundred basis functions. 43 Additionally, integral-direct Fock matrix builds and multipole bound integral estimates have enabled opposite spin-scaled SAPT0 computations on molecules with up to 1100 atoms and 10000 basis functions. 58 These developments provide a framework where higher-accuracy SAPT methods can be used to provide quantitative interaction energies for small model complexes, and then the lower-accuracy SAPT methods can be employed to provide semi-quantitative predictions in larger-scale noncovalent interactions. In particular, it has been found that SAPT0 in the modest jun-cc-pVDZ basis set is remarkably accurate compared to higher-level methods. 55,59,60 Here, most of the accuracy comes from the cancellation of errors between the uncoupled treatment of dispersion in SAPT0 and the limited ability of the jun-ccpVDZ basis set to represent electron correlation. A similar “Pauling point” exists in supermolecular interaction computations when using MP2 in the jun-cc-pVDZ basis set. However, even SAPT0/jun-cc-pVDZ is difficult to apply for more than ∼ 300 atoms due to the combination of O(N 5 ) scaling exchange-dispersion and the vexing presence of augmented functions in the basis set. Another consideration is that the interaction energy becomes increasingly difficult to interpret as the system size grows - what one would ultimately prefer is a way to robustly attribute contributions of the interaction energy to pairs of localized pieces of the two interacting bodies. That is, rather than the scalar interaction energy E int , one would prefer int an “order-2” partition EAB where A and B label atoms or functional groups in the two involved monomers, and where P int E int = AB EAB . This was the subject of previous efforts on atomic 61 and functional-group 62 SAPT partitions (A-SAPT and F-SAPT). In these approaches, an order-2 decomposition of the SAPT perturbation series terms was identified in terms of the nuclei and ground state electronic densities of the monomers, and then the final order-2 partition was obtained by providing a localized definition of the electronic density. In A-SAPT, the electronic density was

ACS Paragon Plus Environment 1

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

divided into atomic contributions via Wheatley’s iterative stockholder analysis (ISA, 63–65 related to iterative Hirshfelder atomization 66,67 ). In F-SAPT, the electronic density was divided into localized orbital contributions using Knizia’s intrinsic bond orbitals (IBOs, 68,69 a form of PipekMezey orbital localization 70 ). The nuclear and localized orbital order-2 partition was then reduced to the order-2 functional group partition via a set of systematic weights that assign fractions of nuclei and localized orbitals to userspecified chemical functional groups. F-SAPT has been shown to provide useful insights into complicated noncovalent interactions, including a direct confirmation 71 of the Wheeler-Houk model of through-space substituent effects in substituted sandwich benzene dimers, 12,72 and a detailed explanation of the enhanced binding of chloroindole-based vs. methylindol-based ligands in the S1 pocket of Factor Xa. 73 The implementation of F-SAPT used in these works is available in Psi4, 74,75 which uses density fitting and is practically limited to ∼ 300 atoms and ∼ 3500 basis functions. It should be noted that there also exists an alternative and increasingly popular set of approaches to visualize noncovalent interactions in terms of heuristic scalar fields extracted from characteristics of the electron density. 76–80 It should also be noted that the F-SAPT partition is neither unique nor derivable - it is merely a physically-motivated partition that has been pragmatically shown to provide enhanced insight into complicated noncovalent interactions. 71,73 In the present work, we present a computational implementation of an F-SAPT variant that is routinely applicable to molecules with thousands of atoms and tens of thousands of basis functions. A number of pragmatic choices are made to enable this. The first choice is that we do not explicitly compute the O(N 5 ) dispersion contributions according to SAPT0 - these are instead estimated empirically using DFT-D3. 81,82 A similar decision was made in Podeszwa and Szalewicz’s HFDc(1) method, 83 Heßelmann’s SAPT+D method, 84 and in Lao and Herbert’s XPol+SAPT+D method, 85 albeit with additional modifications to the specific form of the DFT-D dispersion correction. We note that the use of an empirical dispersion correction necessarily compromises the “ab initio” character of SAPT, but it is necessary from a pragmatic standpoint to treat larger systems. The second choice is that we eschew augmented functions to allow for efficient integraldirect computations. The justification for this is that the augmented functions primarily affect the dispersion term in SAPT0, which is now being treated via DFT-D3. A similar choice (even using the 6-31G** basis used for the first test system below) was made in Ochsenfeld’s linear-scaling variant of SAPT0. 58 These choices enable a computational implementation of F-SAPT based entirely on Fock matrix builds. This enables leveraging of our previously described algorithms for efficient evaluation of Coulomb and exchange matrices on graphical processing units (GPUs). 86–90 We emphasize that the tractability gains realized in this paper arise from the combination of approximations in dispersion and basis set treatments and the subsequent deployment of the resulting equations on GPU hardware. Below, we first present the existing SAPT0 and F-SAPT0 terms required in our variant of F-SAPT. We then show how these can be efficiently formulated in terms of a limited number of Fock-type potential matrix builds, and detail how to obtain an optimized implementation for the case where there is a large size disparity between the involved monomers. The

algorithm presented has been implemented in our Lightspeed rapid-prototyping library for electronic structure, and utilizes the IntBox implementation for GPU-based Fock matrix builds from TeraChem. 86–90 As a demonstrative example, we use the F-SAPT code to investigate the distance dependence of interaction energy contributions and interaction energy difference contributions in the chlorindole/MDM2 drug-protein and indinavir/HIV-II protease complexes (PDB IDs 4MDQ 91 and 1HSG, 92 respectively). Theory and Implementation: Notation: In this work, we consider a closed-shell dimer SAPT0 [a.k.a, SAPT(HF)] interaction between monomers A and B. We use a dimer-centered Gaussian basis set {φp (~r1 )} throughout. The following index classes are used in this work, A, B: All nuclei and electrons of the monomers. A, B: Functional group fragments of the monomers. A, B: Nuclei of the monomers. a, b: The canonical occupied RHF orbitals of the monomers. r, s: The canonical virtual RHF orbitals of the monomers. a ¯, ¯b: The localized occupied RHF orbitals of the monomers. µ, ν, ρ, σ, τ, υ: The dimer-centered Gaussian atomic orbital basis set. Primes are used to distinguish between different indices within the same class, e.g., a and a0 . A generalization of the Einstein summation convention is used throughout this work: any indices appearing only on the RHS of an expression are summed over. Computational Primitives: A number of key potential integrals occur repeatedly in the discussions below, and include nuclear-nuclear, (A|B) ≡

ZA ZB rAB

(1)

electron-nuclear, Z

d3 r1 φµ (~r1 )φν (~r1 )

(µν|B) ≡ − R3

1 ZB r1B

(2)

and electron-electron integrals (electron repulsion integrals), ZZ 1 (µν|ρσ) ≡ d3 r1 d3 r2 φµ (~r1 )φν (~r1 ) φρ (~r2 )φσ (~r2 ) r 6 12 R (3) where ZA is the nuclear charge of the Ath atom. Manipulations of these integrals are the rate-limiting operations in many electronic structure methods, including most SAPT approaches. To reduce this bottleneck, we formulate our FSAPT approach in terms of the following four critical primitive operations: V : Nuclear potential integrals of the form Vµν [ZB ] ≡ P (µν|B). B v: Electrostatic potential integrals of the form vA [Dρσ ] ≡ P (A|ρσ)D ρσ . ρσ J: Coulomb potential integrals of the form Jµν [Dρσ ] ≡ P (µν|ρσ)D ρσ . ρσ

ACS Paragon Plus Environment 2

Page 2 of 22

Page 3 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

K: Exchange potential integrals of the form Kµν [Dρσ ] ≡ P (µρ|νσ)D ρσ . ρσ Here Dρσ is a generalized one particle density matrix for the electrons. The first two primitives involve the nuclearelectron integrals, and are presently performed on CPU. The latter two involve the electron repulsion integrals, and rely on the IntBox GPU-accelerated library 86–90 which forms the core of the TeraChem and Lightspeed packages. All four of these routines utilize Cauchy-Schwarz screening 93 to remove (µν| pairs which are spatially distant in ~r1 , and likewise |ρσ) in ~r2 . The IntBox routines also utilize densitybased screening 93 and mixed precision techniques 89 to substantially accelerate the formation of the J/K matrices in cases where the input D matrix is sparse. In this approach, ρσ e.g., for J, the product Jµν ≡ (µν|ρσ)Dρσ is estimated on a shell-quartet basis as Bµν Bρσ |Dρσ |, where Bµν is the ρσ Schwarz bound for the (µν| pair. This estimate of Jµν is then used to determine whether the integral contribution should ρσ be done in double precision (typically for Jµν > 10−6 ), in −6 ρσ −14 single precision (typically for 10 > Jµν > 10 ), or else neglected entirely. The use of mixed precision is critical to achieve high efficiency on commodity GPU cards (such as NVIDIA GeForce), where double precision throughput is as low as 1/32× of single precision throughput. Using these approaches, IntBox can compute accurate J/K matrices without approximation for at least 30000 basis functions. In developing an efficient F-SAPT code below, there are three important considerations: • The F-SAPT energy should be formulated wholly in terms of the potential integral primitives discussed above, and the number of potential integral build calls should be minimized. • Where possible, the input density matrices Dµν to a J or K matrix build should be chosen to be spatially localized, to maximize density-based screening. • In iterative methods, incremental techniques 93 should be used to enhance density-based screening from one iteration to the next. SAPT0 Workflow: In SAPT0, the RHF wavefunctions of the monomers are first computed. The RHF wavefunction of the dimer is also computed, to provide the RHF interaction int AB A B energy EHF ≡ EHF − EHF − EHF for use in computing the (2) δHF,r correction below. After the RHF wavefunctions are obtained, the nuclear charges ZA /ZB and positions ~rA /~rB , canonical occupied orbital coefficients Cµa /Cµb , canonical virtual orbital coefficients Cµr /Cµs , canonical occupied orbital eigenvalues a /b , and canonical virtual orbital eigenvalues r /s are all available. Note that the molecular orbitals {ψa (~r1 ), ψr (~r1 )} are defined in terms of atomic orbitals {φµ (~r1 )} as, e.g., ψa (~r) ≡ Cµa φµ (~r). All SAPT terms can be computed from these quantities and various overlap and Coulomb integrals within the Gaussian basis set. The Fock-type terms of SAPT0 are now computed. The equations for these terms are well known, 27 and their efficient formulation in terms of generalized J/K builds has been understood for some time. 34,52,53 We present them here for completeness, and to clearly show the manipulations required to minimize the number of J/K builds in the F-SAPT analysis below. The simplest SAPT0 term is electrostatic: E elst10 ≡ (A|B) + 2(aa|B) + 2(A|bb) + 4(aa|bb)

Here the “10” designation in E elst10 is the SAPT convention of denoting perturbation series terms as E term(nm) , where n is the order in the intermolecular interaction potential VˆAB and m is the order in the intramolecular fluctuation potential ˆ A or W ˆ B . 27 This term is efficiently computed as, W A B A B E elst10 ≡ (A|B)+2Dµν (µν|B)+2Dρσ (A|ρσ)+4Dµν Dρσ (µν|ρσ) A B A B = (A|B)+2Dµν Vµν [ZB ]+2Dρσ Vρσ [ZA ]+4Dµν Jρσ [Dρσ ] (5) A Here, Dµν ≡ Cµa Cνa isPthe occupied RHF OPDM for monomer A, Vµν [ZB ] ≡ B (µν|B) is the nuclear potential generated by monomer B in the Gaussian basis set, and P B B Jµν [Dρσ ] ≡ ρσ (µν|ρσ)Dρσ is a generalized J matrix build. For this term, the needed V /J integrals are all available from the monomer RHF computations, so this term is essentially free. The next term is exchange, which is often computed in the S 2 approximation. One formula for this (valid only in a dimer-centered basis set) is,

E exch10 [S 2 ] ≡ −2Sab Sbr (ar|B) − 2Sba Sas (bs|A) − 2Sas Sbr (ar|bs) (6) Here the notation (ar|B) ≡ (ar|B)+2(ar|bb) means the total electrostatic potential of monomer B. The corresponding B quantity in the atomic orbital basis will be denoted Wµν ≡ B Vµν [ZB ] + 2Jµν [Dρσ ]. The overlap integrals are, Z Sµν ≡ Sµν ≡ d3 r1 φµ (~r1 )φν (~r1 ) (7) R3

The exchange energy in the S 2 approximation can be efficiently computed as, A B B E exch10 [S 2 ] = −2Dµν Sνρ Dρσ Sστ PτAυ Wυµ

A B −2Dµν Sνρ Pρσ Kµσ [DµB0 ν 0 Sν 0 ρ0 PρA0 σ0 ] A Here Pµν ≡ Cµr Cνr is the unoccupied RHF OPDM of P monomer A and Kµν [Dρσ ] ≡ ρσ (µρ|νσ)Dρσ is a generalized (and non-symmetric) K build. This term requires a single generalized K build beyond quantities available from the monomer RHF solutions. The full SAPT0 exchange term is,

E exch10 [S ∞ ] ≡ −2(ab|ab) 0

+ 2Taa [(aa0 |B) + 2(aa0 |bb) − (ab|a0 b)] 0

+ 2Tbb [(bb0 |A) + 2(bb0 |aa) − (ba|b0 a)] + 2Tab [(ab|A) + 2(ab|a0 a0 ) − (aa0 |ba0 )] + 2Tab [(ab|B) + 2(ab|b0 b0 ) − (ab0 |bb0 )] 00

+ 2Tab Tbb0 [2(ab|b0 b00 ) − (ab0 |bb00 )] 00

+ 2Tab Taa0 [2(ab|a0 a00 ) − (aa0 |ba00 )] 0

0

+ 2Taa Tbb [2(aa0 |bb0 ) − (ab0 |a0 b)] 0

+ 2Tab Tab0 [2(ab|a0 b0 ) − (aa0 |bb0 )]. (9) Here the T blocks are made of the relevant elements of the Hermitian matrix " # " #−1   0 0 0 Taa Tab δaa0 Sab δaa0 0 T ≡ − . = 0 0 0 0 δbb0 Tba Tbb Sba δbb0 (10) Despite the apparent complexity, this term can be easily constructed with only two new J and K builds corresponding

(4)

ACS Paragon Plus Environment 3

(8)

B A A −2Dµν Sνρ Dρσ Sστ PτBυ Wυµ

Journal of Chemical Theory and Computation 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

0

to the Taa and Tab generalized OPDMs. Specific formulae for this procedure can be found in previous work. 34,52,53 As detailed previously, 61,62 the full F-SAPT0 exchange energy is obtained by simply scaling the S 2 order-2 partition by the ratio between the S 2 and S ∞ SAPT0 terms - no explicit partition of the S ∞ exchange term is attempted. The total induction energy (including induction and exchange-induction) is computed in the S 2 approximation as, E ind20,u/r [A ← B, S 2 ] ≡ 2xB,u/r (ar¯|B0 ) (11) ar B,u/r

The induction amplitudes xar and the total induction potential (ar¯|B0 ) are defined over the next few equations. We show only the expression for the response of monomer A due to the electrostatic field of monomer B. The corresponding expressions for the response of monomer B due to the electrostatic field of monomer A can be obtained by permutation of indices in all expressions in this section. Here the induction amplitudes can either be computed in the uncoupled, xB,u ar ≡

(ar|B) a − r

(12)

or coupled, [δaa0 δrr0 (r − a ) + 4(ar|a0 r0 ) − (aa0 |rr0 ) − (ar0 |ra0 )]xB,r a0 r 0 = −(ar|B)

(13)

limits. The latter requires the solution of the monomer A coupled-perturbed Hartree-Fock (CPHF) equations with the right-hand side being the negative of the electrostatic potential of monomer B. Once the amplitudes are obtained, the induction energy is formed by the dot product of the amplitudes with the induction+exchange-induction potential (ar¯|B) ≡ (ar|B) + (ar˜|B). The exchange induction potential, in the S 2 approximation, is, (ar˜|B) ≡ −(ab|br) − Sab (br|A) − 2Sab (br|a0 a0 ) + Sab (ba0 |ra0 ) + Sab0 (aa0 |br) − 2Sab0 (ar|ba0 ) − Sbr (ab|B) − 2Sbr (ab|b0 b0 ) + Sbr (ab0 |bb0 ) 0

+ Sab Sbr0 (bb0 |A) + 2Sab Sbr0 (bb0 |a0 a0 ) + Sab0 Sbr0 (aa0 |B) 0

0

+ 2Sab0 Sbr0 (bb0 |a0 a0 ) − Sab0 Sbr0 (aa0 |bb0 ) + 2Sab0 Sba0 (ar|bb0 ) 0

0

0

0

0

+ Sab Sba0 (a0 r|B) + 2Sab Sba0 (a0 r|bb) − Sab Sba0 (a0 r|bb0 ). ∞

(14)

As with the S term for exchange, this term is less complex than it appears at first glance. It can be efficiently formulated with a new J and K build corresponding to the effec1 A B tive OPDM Oµν ≡ Dµρ Sρσ Dσν , and one J build each corre2 A B A sponding to the effective OPDMs Oµν ≡ Dµρ Sρσ Dστ Sτ υ Dυν 3 B A B and Oµν ≡ Dµρ Sρσ Dστ Sτ υ Dυν , as described in previous work. 34,53,61 (2) At this point, the “delta Hartree-Fock” correction δHF,r is computed to account for missing infinite-order induction and for missing exchanges in the S 2 treatment of exchangeinduction. This correction is taken to be the difference between the present SAPT expansion and the Hartree-Fock interaction energy. As detailed previously, 61,62 in A-SAPT/FSAPT, the order-2 partition of the full SAPT0 induction energy is obtained by scaling the uncoupled order-2 induction (2) partition to account for the effects of coupling and δHF,r the former avoids the expense of solving the CPHF equations for each fragment in question, while an explicit partition has not been developed for the latter. At this point in conventional SAPT0, the dispersion and exchange-dispersion contributions are computed with an

analog of second-order Møller-Plesset perturbation theory (MP2). However, this presently scales as O(N 5 ), and is not efficiently written in terms of generalized Fock builds. Though much progress has been made in accelerating this term via density fitting 44–51 and Laplace transform 94–97 techniques [and while the scaling could formally be reduced to O(N 4 ) scaling by tensor hypercontraction 98–100 ], this term remains a serious barrier to SAPT calculations on molecules with > 1000 atoms. Moreover, the results obtained at great effort in the uncoupled SAPT0 dispersion term are generally markedly inaccurate (e.g., substantially overbinding π-stacked systems 60,101 and progress is made only by carefully balancing the uncoupled dispersion approximation with a judicious choice of jun-cc-pVDZ basis set or by pursuing more advanced (and somewhat more expensive) coupled Casimir-Polder approaches to the dispersion energy. 32–35 In the present work, we choose to eschew explicit ab initio computation of the dispersion term, and instead invoke semi-empirical DFT-D3 81 to compute the dispersion. DFT-D3 writes the dispersion energy as a 6 6 parametrized sum of damped atom-pairwise CAB /rAB and 8 8 CAB /rAB terms (e.g., as would be found in a force field), and 8 6 coefficients and CA allows the construction of the atomic CA to be dependent on the fractional coordination-number of each atom (e.g., a geometric interpolation between sp2 and sp3 carbons). The use of DFT-D for the dispersion component not only simplifies the computation, but it also leads to a clear order-2 F-SAPT partition of dispersion (by performing sub-summations over the atoms in each fragment pair). A similar use of DFT-D to replace ab inito computation of the SAPT dispersion energy was previously introduced. 83–85 F-SAPT0 Order-2 Analysis: Up to this point, we have presented the relevant expressions for interaction energy terms in the total SAPT energy, e.g., E term . F-SAPT determ where composes this in terms of an “order-2” partition EAB A and B label functional groups on monomers A and B, respectively. This object describes the effective contribution of the interaction between functional groups A and B to the total interaction. Moreover, by construction, the sum of the F-SAPT order-2 energies is constrained to exactly the Padd toterm . total SAPT energy for each term, i.e., E term = AB EAB Thus F-SAPT is not a fragment-based approximation to SAPT, but rather an a posteriori partition of the total SAPT interaction energy. Another useful quantity is the “order-1” partition, which is obtained by summing the order-2 partition over the functional groups of oneP or the other monomer, P term term term i.e., EA ≡ B EAB or EBterm ≡ A EAB . This object describes how functional group A interacts with the totality of monomer B, and vice versa. The order-1 partition is a much lower-dimensional object than the order-2 partition, and can often provide valuable insights before the full order-2 partition is explored. At this point in the F-SAPT workflow, we localize the orbitals to obtain the IBOs 68 ψa¯ (~r1 ) ≡ Lµ¯a φµ (~r1 ) for each A monomer. A set of weights wA and wa¯A are determined to encapsulate how much each nucleus A or localized occupied orbital a ¯ is owned by a user-specified fragment A, e.g., the “50/50 link bond” rule 62 used herein (see below). The Focktype terms of the F-SAPT0 order-2 partition may then be formed, and consist of electrostatic, elst10 A B B EAB = wA wB (A|B) + 2wa¯A wB (¯ aa ¯|B)

ACS Paragon Plus Environment 4

Page 4 of 22

A B +2wA w¯b (A|¯b¯b)

+

4wa¯A w¯bB (¯ aa ¯|¯b¯b)

(15)

Page 5 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation elst10 built in EAB , via,

exchange, exch10 EAB [S 2 ] = wa¯A w¯bB

(16)

h i ¯ × −2Sa¯b S¯br (¯ ar|B0 ) − 2S¯ba¯ Sa¯s (¯bs|A0 ) − 2Sa¯s S¯br (¯ ar|¯bs)

2

← B, S ] =

B 2wa¯A wB Ua¯a Ua0 a¯

(ar|B) 0 ¯ (a r|B) (17) a − r

(ar|¯b¯b) 0 ¯ (a r|B) a − r In our previous work, we first formed the order-2 partition to the detailed level of local occupied orbitals and nuclei, A e.g., Ea¯elst10 and then applied the weights wA and wa¯A in ¯ b post-processing. Density fitting treatment of the electron repulsion integrals makes this tractable for O(100) atoms on current computers, and allows for flexibility in the choice of fragment partition without the need for a new F-SAPT computation. However, this approach is unsustainable for a large-scale Fock-based code, as it would require O(Na¯ ) J or K builds. To obviate this problem, we will directly form the term contributions to EAB , striving to minimize the number of J and K builds. One important consideration in this direction is that there are generally several ways to construct the order-2 partitions. E.g., for electrostatics, we may evaluate the electrostatic potential contributions from monomer A and trace these against the charge density contributions for monomer B, or vice versa. Explicitly, the electron-electron repulA B elst10 ← Dµν Jµν [Dρσ ] sion term could be formulated as EAB B A elst10 ← Dµν Jµν [Dρσ ]. If the two monomers (and assoor EAB ciated fragmentation patterns) are identical, both schemes will require the same number of J matrix builds. However, in many cases, one of the monomers is considerably smaller than the other in terms of number of atoms and number of fragments (e.g., the case of a small ligand bound to a large protein target). In such cases, it is advantageous to form the electrostatic potential contributions corresponding to the smaller monomer, resulting in fewer and sparser J builds. Below, we assume that monomer A is the smaller molecule, and present equations optimized for this case. For electrostatics, h i elst10 B A A EAB = wB (B|A)wA + 2vB [Dρσ ] (18) h i A A +2w¯bB Lµ¯b Vµν [ZA wA ] + 2Jµν [Dρσ ] Lν¯b +4wa¯A w¯bB Ua¯a Ua0 a¯

A Here Dρσ ≡ Lρ¯a wa¯A Lσ¯a is the fragment density matrix. This requires one v, one V , and one J build per fragment in monomer A. Moreover, the involved density matrices are fragment densities, and will therefore be quite sparse (together they sum to the total density of monomer A). For exchange,  ¯ exch10 EAB [S 2 ] = −2wa¯A w¯bB Sa¯b S¯br (¯ ar|B0 ) + Sa¯s (¯bs|A0 ) (19) A −2w¯bB Lµ¯b Sµν Pνρ Kρσ [OρA0 σ0 ]Lσ¯b

Here Oρ0 σ0 ≡ DρA0 σ0 Sσ0 τ 0 PτB0 υ0 . The last contribution requires one non-symmetric K build per fragment in monomer A. The input density matrices codify the overlap of the fragment orbitals on A with the virtual space of B, and are therefore at least as sparse as the fragment density matrices A Dρσ . For the induction of monomer B due to the electrostatic ind20,u potential of monomer A, e.g., EAB [B ← A, S 2 ], the order2 partition can be computed with the potential integrals

h i A A (bs|A) ≡ Cµb Vµν [ZA wA ] + 2Jµν [Dρσ ] Cνs

(20)

(21)

are the potential integrals from equation 18. For the induction of monomer A due to the electrostatic ind20,u potential of monomer B, e.g., EAB [A ← B, S 2 ], the order2 partition is formed via, A ind20,u B A ]Lν¯b EAB [A ← B, S 2 ] = wB vB [Zrs ] + 2w¯bB Lµ¯b Jµν [Zρσ (22) where, wA Ua¯a Ua0 a¯ (a0 r|B) A Zρσ ≡ 2Cρa a¯ Cσr (23) a − r This requires one v and one J build per fragment in monomer A. Computational Details: The F-SAPT analysis discussed above was implemented as a simple Python module using the Lightspeed electronic structure library, which, in turn, relies on TeraChem’s IntBox 86,87,89 J/K integral library. The dimer/monomer SCF computations and the monomer CPHF computations are all performed using incremental Fock build techniques with direct inversion of the iterative subspace (DIIS) convergence acceleration. 102 Orbital localization is performed using Knizia’s intrinsic bond orbital (IBO) approach, 68 with separate localization performed for the core and valence orbitals. The dimer-centered cc-pVDZ-minAO basis set (the core and valence functions from the fully contracted cc-pVDZ basis set) is used to form the intrinsic atomic orbitals underlying IBO. A and wa¯A are determined from a userThe weights wA specified atomic fragmentation pattern according to the “50/50 link bond” rule. 62 Here, a localized doubly-occupied orbital is classified by orbital atomic charges as being either wholly owned by one fragment, or as being a linking σ bond between two fragments. For each linking σ bond, one proton from each of the two involved atoms is affiliated with the σ bond (neutralizing the σ bond and allowing for only dipolar contributions from the link unit). The link unit is then divided evenly (i.e., “50/50”) between the two involved fragA and ments. This allows for the formation of the weights wA A wa¯ immediately after the division of the link units, avoiding separate F-SAPT computations explicitly involving the link units. Simple rescaling to account for infinite-order exchange, (2) coupled induction, and δHF,r are applied to adjust the FSAPT partition, as discussed previously. 61,62 Grimme’s dftd3 code 81 is used in file-based mode to provide an estimate of the order-2 partition of the dispersion term - this code provides a pairwise-atomic breakdown of the dispersion energy as a command line option. These are simply added for all atom pairs within a fragment pair to obtain disp(−D3) the EAB partition. The Becke-Johnson parametrization of the -D3 damping for Hartree-Fock is selected for this analysis. 82 Note that the Hartree-Fock variant of -D3 is somewhat contaminated by intramolecular correlation contributions to electrostatics, exchange, and induction accrued during fitting. This will be removed in future work by moving to SAPT(DFT) treatment of electrostatics, exchange, and induction, and by choosing a form of the emperical dispersion correction that is designed to more-accurately rep-

ACS Paragon Plus Environment 5

(bs|A) 0 ¯ (b s|A) b − s

where,

and induction, ind20,u [A EAB

ind20,u EAB [B ← A, S 2 ] = 2w¯bB Ub¯b Ub0 ¯b

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

resent the dispersion energy. 83–85 Results and Discussion: Chemical Systems: To demonstrate the efficiency and potential applications of our new F-SAPT implementation, we focus on two drug-protein complexes derived from crystal structures: PDB IDs 4MDQ and 1HSG. The structure and details of these systems are depicted in Figure 1. 4MDQ is a 2.12 ˚ A resolution crystal structure of a complex of 2× cholorindole-based ligands bound to MDM2. 91 Potent and specific antagonists of MDM2 prevent ubiquitination of the p53 tumor suppression protein, and are a promising route to nongenotoxic cancer treatment. Additionally, 4MDQ is one of the smallest known crystal structures featuring non-trivial drug-protein interactions. We apply F-SAPT to the full complex, in basis sets up to 6-31G**. 4MDQ provides a good testbed for investigating the distance decay of both the interaction energy and the difference in interaction energy upon ligand substitution in a full drug-protein system - in this work we will substitute the chloroindole-based ligands by methylindole variants, and probe the effect of this substitution on the interaction energy. The MDM2 protein unit in 4MDQ consists of a single chain of 86 residues, including a number of prolines. 80 crystal waters are included (assigned to the protein monomer), including a mix of structural and nearby bulk waters. The overall charge of the protein/water monomer is +5. There are two structurally identical ligands, each with 63 atoms divided into 7 neutral functional groups as shown in Figure 1C. In discussions below, we will refer to the 28W ligand with the more-buried chloroindole unit as “28W-1” (resid 400-406 in 4MDQ), and the 28W ligand with the less-buried chloroindole unit as “28W-2” (resid 500-506 in 4MDQ). The full 4MDQ system has 1831 atoms, including ligands and water molecules. 1HSG is a 2.0 ˚ A resolution crystal structure of the ligand indinavir (marketed as Crixivan) bound to HIV-II protease. 92 Indinavir was developed as a protease inhibitor. 1HSG has an additional significance in the theoretical noncovalent interactions community as a popular test set of small (< 50 atom) dimers extracted from localized ligand-protein contacts from within the crystal structure. 8 We model the complete 1HSG complex with up to a 6-31G* basis set in the present study. The HIV-II protease protein unit in 1HSG consists of a homo-dimer of 2× 99-residue chains, including a number of prolines, for a total of 198 aminoacid residues. 89 crystal waters are included (assigned to the protein monomer), including a mix of structural and nearby bulk waters. The overall charge of the protein/water monomer is +4. The indinavir ligand “MK1” is neutral, has 92 atoms, and is divided into 8 neutral functional groups according to Figure 1D. The full 1HSG system has 3487 atoms, including the ligand and water molecules. Both of these systems were prepared from crystal structure via AMBER/GAFF models and MM minimizations. The functional group fragmentation patterns for the ligands are supplied by the user specifying different residue names for the various fragments of the ligand in the input PDB file - these are shown in Figure 1C-D. For the protein/water monomer, a standard fragmentation scheme into amino acid side chains (including Cα atoms) and CONH peptide bond fragments is adopted as depicted in Figure 1E. This decomposition is automatically inferred from the input PDB file. For prolines, the corresponding peptide bond fragment is reduced to CON, and the N-Cδ σ bond is shared between the

proline side chain and the CON peptide bond moiety. None of the present systems contain disulfide bonds, but these are easily handled (indeed, they were present in our work on S1 binders in Factor Xa): the S-S moiety is treated as an additional fragment, with two linking σ bonds to the involved cysteine residues. Computational Timings Study for 4MDQ: Table 1 contains the detailed timings for F-SAPT/6-31G** on 4MDQ, run on a node with 8× NVIDIA GTX 970 GPUs, 2× Intel Xeon E5-2643 quad-core CPUs @ 3.30 GHz, and 400 GB RAM. This system has 1831 atoms, 6760 electrons, and 17809 basis functions. The algorithm presented above (the left-side “A@B” column) can provide the F-SAPT order2 decomposition in 20.4 hours of wall time, using approximately 250 GB of RAM. Almost exactly half of the wall time (11.2 hours) is spent on the dimer and monomer RHF computations. 4.1 hours is spent forming the Fock-type terms in SAPT0, including 1.7 hours of monomer CPHF equations. Orbital localization and fragment weight determination requires 1.0 hours, though we note that the Jacobi-rotationbased simultaneous diagonalization algorithm of our implementation of IBO is not optimized. An explicit computation of the order-1 F-SAPT partition requires only 0.25 hours of wall time (negligible), while the order-2 F-SAPT partition requires 3.9 hours of wall time (20% of the total wall time). Thus, the extra overhead of providing the order-1 and/or order-2 F-SAPT partitions is minor relative to the total SAPT computation. Table 1 also demonstrates the importance of optimizing the F-SAPT algorithm for the case where one monomer is much smaller than the other in terms of number of atoms and number of functional groups. The middle “B@A” bar shows the timings for the case where we exchange the ligand and protein as logical monomers A and B, respectively. The order-2 partition obtained from either ordering is numerically identical (subject to the permutation of A and B labels). However, in the B@A case, the order-2 F-SAPT partition time rises from 3.9 hours to 64.3 hours (a 16.4× cost increase), due to the need to form Fock-type potentials for 251 protein fragments, vs. the previous requirement of Fock-type potentials for 14 ligand fragments. The B@A total wall time jumps from 20.5 hours to 80.9 hours, a 3.9× increase in total cost. Computational Timings Study for 1HSG: Table 1 also shows the detailed timings for F-SAPT/6-31G* on 1HSG, run on a node with 6× NVIDIA GTX 1080 Ti GPUs, 2× Intel Xeon E5-2643v4 quad-core CPUs @ 3.40 GHz, and 768 GB RAM. This system has 3487 atoms, 10119 electrons, and 28456 basis functions. The F-SAPT algorithm developed in this work can produce the order-2 F-SAPT partition in less than 40 hours of wall time, using roughly 450 GB of RAM. Slightly over half of the wall time is spent on the HartreeFock interaction energy computation. Less than one quarter of the wall time is spent on the SAPT0 computation, while roughly one-quarter of the wall time is spent on the F-SAPT portion of the computation. Notably, most of the F-SAPT time is spent on orbital localization and functional group fragmentation (5.3 hours of wall time) - this is entirely due to the unoptimized nature of our simultaneous diagonalization algorithm for the IBO orbital localization. Also noteworthy is the fact the linear algebraic steps start to become dominant for such a large-scale system. In attempting to push beyond systems of this size and basis quality, one encounters barriers in total RAM footprint, GPU memory

ACS Paragon Plus Environment 6

Page 6 of 22

Page 7 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

footprint, and computational time, in roughly the indicated order. By using QM/MM methods with large QM regions, one can contemplate the application to proteins with several hundred residues and an explicit solvent environment. Efforts are underway to develop such a QM/MM-based FSAPT method, and we anticipate that this will prove useful for high throughput screening of large-scale drug-protein complexes in the near future. Order-1 F-SAPT Study for 4MDQ: Figure 2 depicts the order-1 F-SAPT interaction energy partition for 4MDQ projected onto the molecular geometry. In such plots, all atoms of each functional group are colored according to the total contribution for the functional group. The order-1 contribution for each functional group should be interpreted to mean the extent to which the functional group attracts (red) or repels (blue) the totality of the opposite monomer. Note that the color scales saturate at ± 5 kcal mol−1 . The interested reader is encouraged to explore these plots interactively in 3D, using PyMol and data/visualization scripts in the supplementary material - this allows a much more detailed view than the static 2D picture required for presentation in this manuscript. A different view of the order-1 data for the protein is depicted in Figure 3. This figure shows the absolute value of the residual interaction energy, defined as,   X |∆EBint | ≡  EBint0  − E int (24) B0 ≤B Here the protein fragments B are sorted by closest pairwise contact with any ligand atom. The residual interaction energy is interpreted as the magnitude of the total remaining interaction energy from residues that are further from the ligand(s) than B, and provides an indicator of the distancebased sub-region of the protein system required to converge the various SAPT terms. In Figure 3 and similar analyses (see below), we provide a breakdown of the residual interaction energy along the lines of SAPT term and protein functional group type (side chains, peptide links, and waters). Note that fluctuations or intermediate excursions toward zero in this analysis are indicative of oscillations in the running interaction energy sum. An intriguing, but ultimately simple set of findings emerge from this analysis. Overall, exchange and induction of the protein caused by the ligand (IndBA) decay very rapidly, while electrostatics, induction of the ligand caused by the protein, and dispersion all decay more slowly. At short nearest-neighbor contact distances of ∼ 3 ˚ A, dispersion, exchange, and electrostatic terms all are significant contributors to the residual interaction energy. The side chain and water contributions dominate at these distances, with residual interaction energy contributions on the order of 100 kcal mol−1 . Peptide link residuals are only on the order of 10 kcal mol−1 for these distances, simply because there are few peptide-link contacts at < 4 ˚ A closest contact. Initial convergence of all SAPT interaction energy contributions is relatively rapid, with residuals permanently falling below 10% of their original values by the time residues with closest contacts out to 5 ˚ A are included. However, the residuals exhibit an absolute error of 1 to 3 kcal mol−1 at 5 ˚ A, with individual residual contributions on the order of 1 kcal mol−1 remaining for dispersion, electrostatics, and induction of the ligand by the protein. Remarkably, dispersion is the dominant residual contributor out to ∼ 7 ˚ A, with the

dominance stemming from both side chain and protein contributors (in water, dispersion and electrostatics are of equal importance for these distances). This is entirely expected, as dispersion is a negative-definite quantity, with all pairwise terms having the same sign. By contrast, the traditionally “long-ranged” electrostatic terms are tempered by the possibility of either a positive or negative contribution due to the signs of the charges and/or dipoles in the numerators of these terms. If the figures are replotted with the cumulative summations performed on the absolute value of each contribution (see supplemental information), the electrostatic and induction terms become the dominant contributors across the full range of R. Beyond 5 ˚ A, dispersion continues to decay monotonically and rapidly, while the decay rates of both electrostatics and induction of the ligand by the protein stagnate and become the dominant contributions to the residual. Interestingly, electrostatics and induction of the ligand by the protein are nearly equal contributors to the residual throughout the long-range area. This is understandable upon consideration of the physics of the induction term within F-SAPT: the entire protein polarizes the ligand, and then the electrostatic potential of the ligand difference density is traced against each of the protein fragments. Therefore, a contribution from a permanent ligand dipole in electrostatics enters the F-SAPT partition in the same way as an induced dipole in induction, and the relative dominance of these two terms in the long-range limit will be determined by the relative magnitudes of the permanent and induced dipole moments. Overall, the total SAPT interaction energy residual falls below < 1 kcal mol−1 at a closest contact distance ∼ 11 ˚ A. The long range needed to converge the SAPT interaction energy appears to be driven by electrostatic and/or induction contributions from charged protein side chains - the peptide links and structural water contributions are converged to < 1 kcal mol−1 by 8 ˚ A and 5 ˚ A, respectively. Overall, the findings from this section regarding the possibility of using a subsection of the protein in drug-ligand binding studies is mixed. The presence of significant longranged ligand dipole and protein charge and/or ligand contributions necessitates that a significant portion of the protein be used (out to ' 11 ˚ A in this case). Tempering this, the long-range portions of the protein are only involved in electrostatic source terms and induction of the ligand by the protein. Therefore, it is highly plausible that QM/MM methodology would accurately capture the long-range portion of the interaction energy, enabling the use of a considerably smaller quantum region than the full protein. E.g., for 4MDQ, the exchange and induction of the protein by the ligand have all decayed to well below 1 kcal mol−1 by 4 to 5 ˚ A, and charge penetration contributions to electrostatic and induction terms are surely negligible past these distances. This suggests that one could accurately use a quantum region composed of closest contacts out to ∼ 5 ˚ A, and then a non-polarizable MM region beyond this out to the full scale of the protein. Note that this is a quantum region of considerable size - on the order of 670 quantum atoms. Order-1 Difference F-SAPT Study for 4MDQ: Figure 4 depicts the order-1 difference F-SAPT interaction energy partition for 4MDQ with Cl vs. Me substituents. The order-1 contribution for each functional group should be interpreted to mean the extent to which the functional group is more favorable in Cl (red) or more favorable in Me (blue). The residual difference interaction energy is shown in Figure

ACS Paragon Plus Environment 7

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

5. The difference interaction energy (-2.6 kcal mol−1 ) is much smaller than the total interaction energy (-187 kcal mol−1 ). Significant SAPT terms in the difference interaction energy are electrostatics (-2.7 kcal mol−1 ), exchange (-1.8 kcal mol−1 ), and dispersion (+1.6 kcal mol−1 ). The two induction terms carry a rather negligible contribution (+0.3 kcal mol−1 ). Considering the ligands, the two directly-substituted L05 residues carry the bulk of the difference interaction energy (-0.83 and -2.61 kcal mol−1 , respectively). The other ligand residues contribute a collective difference interaction energy of +0.85 kcal mol−1 , with the largest single non-substituted ligand residue contributing +0.32 kcal mol−1 . This indicates that differential intramolecular three-body effects involving two ligand fragments and the protein are small, and implies that a reduced model of the ligand, e.g., chloroindole vs. methylindole can be safely used to qualitatively probe the nature of the substitution. Considering the protein contributions, the difference energy is carried by a few close-lying fragments for exchange and dispersion. For electrostatics, the difference energy is carried by a few protein fragments, but these are spread throughout the protein. Closer inspection of the most significant long-ranged electrostatic contributions reveals that these are residues with charged side chains or charged terminal residues. The decay plots in Figure 5 reveal that all terms except electrostatics have converged to less than 0.1 kcal mol−1 within a closest contact distance of 5 ˚ A. Electrostatics decay much more slowly, with a residual of over 0.1 kcal mol−1 remaining out to nearly 15 ˚ A. The side chain and peptide link fragments both contribute to this slow decay. The peptide link fragments each have relatively small contributions past 5 ˚ A, but cumulatively add to a significant total. The side chain fragments individually have similar magnitudes as the peptide links, except for charged side chains, which may carry difference energy contributions of several tenths of one kcal mol−1 even at distances approaching 15 ˚ A. The overall finding from this section is that, for ligand substitutions of this type, explicit quantum mechanical effects must be included out to ∼ 5 ˚ A. Past this distance, the difference interaction energy has significant contributions from electrostatics, which likely can be accurately modeled with classical point charges. Order-2 Difference F-SAPT Study for 4MDQ: Figure 6 depicts the order-2 F-SAPT partition of the difference interaction energy in 4MDQ for the substituted L05 ligand residues on 28W-1 and 28W-2. The order-2 analysis allows us to probe the origins of the enhancement for each ligand residue separately. The total enhancement of -0.83 kcal mol−1 for 28W-1 L05 is driven primarily by reduced steric repulsion in the Cl variant (−2.21 kcal mol−1 ), tempered by reduced dispersion in the Cl variant (+1.43 kcal mol−1 ). These differences come primarily from close contacts between 28W-1 L05 and LEU33, ILE37, PHE67, and ILE75. 28W-1 L05 has a relatively negligible net electrostatic difference of +0.1 kcal mol−1 , though there are a number of significant individual contributors throughout the protein which cancel with each other to give this negligible value. The total enhancement of -2.61 kcal mol−1 for 28W-2 L05 is driven primarily by more-favorable electrostatics (-3.21 kcal mol−1 ). The strongest contributor to the electrostatics

is the close contact with ARG73 (-3.30 kcal mol−1 ), though there are other canceling electrostatic contributions throughout the protein. Exchange is also an important contributor for 28W-2 L05 (+0.58 kcal mol−1 ), with the close contact with ARG73 again dominating (+1.10 kcal mol−1 ). These findings are entirely consistent with chemical intuition: 28W-1 L05 achieves a modest enhancement of -0.83 kcal mol−1 by achieving a more-favorable steric/dispersion contribution within its buried and non-polar binding pocket, while 28W-2 L05 achieves a larger enhancement of -2.61 kcal mol−1 by improving its electrostatic interaction with the charged ARG73 in close contact at the periphery of protein. Order-1 F-SAPT Study for 1HSG: Figure 7 depicts the order-1 F-SAPT interaction energy partition for 1HSG projected onto the molecular geometry. Figure 8 contains the corresponding plot for the residual interaction energy. This system provides an interesting complement to 4HSG due to the larger size of the system and nearly buried nature of the ligand binding pocket in 1HSG. At close range, exchange, dispersion, and electrostatics dominate the SAPT total interaction energy. These terms all decay to below 10% of their original values by 5 ˚ A, but significant residuals on the order of 5 kcal mol−1 remain. Exchange and induction of the protein by the ligand decay rapidly past 5 ˚ A, while the other terms decay more slowly. As with 4MDQ, dispersion is found to dominate at intermediate distances, and is the dominant or co-dominant term out to 10 ˚ A in 1HSG. At distances exceeding 10 ˚ A, the induction of the ligand by the protein is the dominant term, and retains residuals on the order of 1 kcal mol−1 out to 18 ˚ A. Major contributions to this term at these distances arise wholly from charged side chain residues. Note that electrostatics has converged to a residual of less than 1 kcal mol−1 by 8 ˚ A. Overall this is unsurprising: in familiar terms, the effect from the induced dipole on the ligand is larger than that from the permanent dipole on the ligand. However, this highlights the fact that electrostatics and induction must generally be considered on the same footing in large-scale noncovalent interactions. In this system, side chain residues, particularly charged chain residues, must be retained out to 18 ˚ A to converge the residual to 1 kcal mol−1 . Peptide links must be retained out to 8 ˚ A and waters out to 7 ˚ A for the same convergence threshold. As with 4MDQ, exchange and induction of the protein by the ligand have decayed well below 1 kcal mol−1 by 5 ˚ A. Charge penetration effects in electrostatics and induction will surely be negligible past these distances, so it is quite plausible that a QM/MM variant of F-SAPT with a quantum region of 5 ˚ A could provide accurate results. Note however that this is still a very large quantum region - on the order of 700 quantum atoms. Previous work has shown that large quantum regions are required for accurate results (including excitation energies, 103 reaction barriers, 104–106 NMR shieldings, 107,108 and redox energies 109 ) in QM/MM calculations, so this may not be surprising. Order-2 F-SAPT Study for 1HSG: Figure 9 depicts the order-2 partition of the F-SAPT/6-31G* total interaction energy for 1HSG. This analysis can provide insight into the importance and nature of binding for each fragment of the indinivar ligand. For instance, L02 is a t-butyl group interacting with a non-polar region of the binding pocket. The FSAPT shows strongly favorable interactions with the pocket VAL32, ILE47, ILE84, ILE149 residues. Consideration of the individual SAPT terms (not shown - data in supporting

ACS Paragon Plus Environment 8

Page 8 of 22

Page 9 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

information) indicates that these favorable interactions arise from dispersion and electrostatics (presumably charge penetration) and are tempered by exchange. A few more longranged interactions occur in the electrostatic terms involving nearby charged side chain residues, but the interaction energy for L02 remains much more localized than the other ligand residues. For another example, a striking feature on the F-SAPT order-1 plot is the protein residue ARG8 (the dark blue residue in the central foreground of the Total panel on Figure 9, which provides an unfavorable contribution of +5.4 kcal mol−1 . Order-2 F-SAPT analysis reveals that this unfavorable interaction stems from electrostatic interactions with ligand fragments L05 and L06. In the case of L05ARG8 (+5.1 kcal mol), the unfavorable interaction arises due to unfavorable peptide dipole alignment in L05 relative to the charged ARG8 side chain. In the case of L06-ARG8 (+4.2 kcal mol−1 ), the unfavorable interaction arises due to an unfavorable hydrogen bond between the L06 hydroxyl group and the nearby charges ASP128 (which is forming a salt bridge with ARG8). In both of these cases a favorable interaction is achieved with ASP128, tempering the unfavorable interaction with ARG8. Thus, proposing a modification to L05 or L06 to completely remove the unfavorable interaction with ARG8 would doubtless prove unfavorable in the net by also deleting the favorable interaction with ASP128. However, it may well be possible to tune L05 or L06 to bind more favorably with ASP128 and/or less unfavorably with ARG8. E.g., one could imagine a geometrical or chemical change to shorten the L06-ASP129 hydrogen bond, increasing the L06-ASP129 hydrogen bond strength and/or diminishing the unfavorable L06-ARG8 dipole-charge interaction. In such cases, F-SAPT should prove useful to anticipate the effect of the proposed substitution on the interaction energy, including illuminating any possible favorable/unfavorable long-range side effects throughout the rest of the protein pocket. One observation from working with the order-2 F-SAPT partition is the large volume of data produced by the analysis. For 1HSG, there are 6 SAPT terms, 8 ligand residues plus the total, and 482 protein fragments, yielding 26028 unique F-SAPT order-2 contributions. Molecule-based colormaps such as those found in Figure 9 are highly effective at compressing this data into an easily absorbed form, but much remains to be done to improve the workflow. We have found that viewing these plots in interactive 3D mode is far more useful than viewing static 2D images - instructions on how to use PyMol to view the F-SAPT data are provided in the supporting information. In the future, we anticipate that a customized GUI with options to view arbitrary SAPT terms, subsets of the order-2 data, and detailed information for each fragment will significantly simplify F-SAPT analysis. Summary and Outlook: We have developed a largescale implementation of F-SAPT which is capable of providing the order-2 F-SAPT partition for systems with over 3000 atoms and over 25000 basis functions, e.g., full ligandprotein binding complexes. This advance was accomplished by 1) replacing explicit computation of the SAPT dispersion term with Grimme’s DFT-D3 empirical dispersion correction, 2) considering basis sets without augmented functions [in full SAPT(HF) these are primarily used to provide fa(20) vorable error cancellation in the Edisp term, which has been replaced by -D3], 3) implementing F-SAPT in terms of a limited number of J and K matrix builds, and 4) exploiting

previous advances in J/K evaluation on GPU hardware. We demonstrated the capabilities of the new F-SAPT code by application to two large drug-protein complexes: 4MDQ and 1HSG. A combination of order-1 and order-2 F-SAPT partitions were shown to effectively identify and quantify the significant contributors to the drug-protein interaction, including both total interaction energy and difference interaction energy upon ligand substitution. Analysis of the distance-dependent decay of the SAPT interaction energy and difference interaction energy revealed a number of key findings that have implications on the appropriate model system to use for large-scale noncovalent interactions. Notably, all SAPT terms must be included quantum mechanically out to ∼ 5 ˚ A closest contact distance. At intermediate distances of ∼ 5 − 8 ˚ A, dispersion is often observed to be the dominant term, though of course this can be captured by an empirical dispersion correction. At distances > 8 ˚ A, either electrostatics or induction of the ligand by the protein can be dominant, and these terms decay extremely slowly. However, both of these electrostatic source terms may be well-described by classical atom-centered point charges at these distances. Interaction energy differences upon ligand substitution decay disappointingly slowly due to differential electrostatic interactions with charged sides chains of the protein. Therefore, it seems that the appropriate minimal model system for large-scale noncovalent interactions is a large QM/MM variant of F-SAPT with a quantum region containing all residues within a ∼ 5 ˚ A closest contact distance of the ligand, and classical electrostatic sources and empirical van der Waals interactions for the rest of the system. Such a QM/MM-based F-SAPT code would still require a very large QM region - on the order of 700 atoms for both complexes considered in this work. There exist a number of future developments on both theory and application of F-SAPT. On the theory side, an immediate but straightforward pair of advances can be had by merging the present approach with SAPT(DFT) methodology (for the electrostatic, exchange, and induction terms) and QM/MM methodology. This would provide an improvement in the accuracy of the method, and would extend the applicability of F-SAPT from the small ∼ 200-residue proteins we treated here out to the standard ∼300-500-residue proteins forming the bulk of biochemistry (including explicit solvent). Another important extension is to explore variations of the DFT-D flavor beyond the simple DFT-D3(BJ) variant used here, e.g., the SAPT-specific DFT-D variants developed by Podeszwa and Szalewicz, 83 Heßelmann, 84 and by Lao and Herbert. 85 Yet another task is the development of a GUI workflow for the rapid analysis of F-SAPT partitioned interaction energies. We are hopeful that this new F-SAPT implementation will enable a number of exciting studies in drug-protein binding, molecular crystal packing, and catalyst design. Supporting Information: Molecular geometries, LightSpeed output files, raw F-SAPT order-2 data, and PyMol visualization scripts. This information is available free of charge via the Internet at http://pubs.acs.org/. Financial Disclosure: TJM is a cofounder of PetaChem LLC. Acknowledgements: This work was supported by the National Science Foundation (ACI-1450179). We thank Dr. Woody Sherman of Silicon Theraputics for bringing 4MDQ to our attention. We thank Dr. Xin Li for providing code for the nuclear and electrostatic potential integrals

ACS Paragon Plus Environment 9

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

in Lightspeed. RMP thanks Prof. C. David Sherrill and Prof. Edward G. Hohenstein for inspiring discussions on FSAPT-related topics.

(25)

References

(26)

(1) Bondarev, D. A.; Skawinski, W. J.; Venanzi, C. A. Nature of Intercalator Amiloride-Nucleobase Stacking. An Empirical Potential and Ab Initio Electron Correlation Study. J. Phys. Chem. B 2000, 104, 815–822. (2) Kumar, A.; Elstner, M.; Suhai, S. SCC-DFTB-D Study of Intercalating Carcinogens: Benzo(a)Pyrene and Its Metabolites Complexed with the G-C Base Pair. Int. J. Quantum Chem. 2003, 95, 44–59. (3) Langner, K. M.; Kedzierski, P.; Sokalski, W. A.; Leszczynski, J. Physical Nature of Ethidium and Proflavine Interactions With Nucleic Acid Bases in the Intercalation Plane. J. Phys. Chem. B 2006, 110, 9720–9727. (4) Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K.; Dill, K. A. Predicting Absolute Binding Free Energies to a Simple Model Site. J. Mol. Biol. 2007, 371, 1118–1134. (5) Hill, J. G.; Platts, J. A. Local Electron Correlation Descriptions of the Intermolecular Stacking Interactions Between Aromatic Intercalators and Nucleic Acids. Chem. Phys. Lett. 2009, 479, 279–283. (6) Li, S.; Cooper, V. R.; Thonhauser, T.; Lundqvist, B. I.; Langreth, D. C. Stacking Interactions and DNA Intercalation. J. Phys. Chem. B 2009, 113, 11166–11172. (7) Kol´ aˇ r, M.; Kubaˇ r, T.; Hobza, P. Sequence-Dependent Configurational Entropy Change of DNA Upon Intercalation. J. Phys. Chem. B 2010, 114, 13446–13454. (8) Faver, J. C.; Benson, M. L.; He, X.; Roberts, B. P.; Wang, B.; Marshall, M. S.; Kennedy, M. R.; Sherrill, C. D.; Merz, K. M. Formal Estimation of Errors in Computed Absolute Interaction Energies of Protein-Ligand Complexes. J. Chem. Theory Comput. 2011, 7, 790–797. (9) Ryde, U.; Sderhjelm, P. Ligand-Binding Affinity Estimates Supported by Quantum-Mechanical Methods. Chem. Rev. 2016, 116, 5520–5566. (10) Duan, L.; Liu, X.; Zhang, J. Z. Interaction Entropy: A New Paradigm for Highly Efficient and Reliable Computation of ProteinLigand Binding Free Energy. J. Am. Chem. Soc. 2016, 138, 5722–5728. (11) Ehrlich, S.; Gller, A. H.; Grimme, S. Towards full QuantumMechanics-based ProteinLigand Binding Affinities. Chem. Phys. Chem 2017, 18, 898–905. (12) Wheeler, S. E.; Houk, K. N. Substituent Effects in the Benzene Dimer are Due to Direct Interactions of the Substituents with the Unsubstituted Benzene. J. Am. Chem. Soc. 2008, 130, 10854–10855. (13) Bloom, J. W. G.; Wheeler, S. E. Taking the Aromaticity out of Aromatic Interactions. Angewandte Chemie International Edition 2011, 50, 7847–7849. (14) Lu, T.; Wheeler, S. E. Origin of the Superior Performance of (Thio)Squaramides over (Thio)Ureas in Organocatalysis. Chem. Eur. J. 2013, 19, 15141–15147. (15) Wheeler, S. E.; Seguin, T. J.; Guan, Y.; Doney, A. C. Noncovalent Interactions in Organocatalysis and the Prospect of Computational Catalyst Design. Acc. Chem. Res. 2016, 49, 1061–1069. (16) Podeszwa, R.; Rice, B. M.; Szalewicz, K. Predicting Structure of Molecular Crystals From First Principles. Phys. Rev. Lett. 2008, 101, 115503. (17) Ringer, A. L.; Sherrill, C. D. First Principles Computation of Lattice Energies of Organic Solids: The Benzene Crystal. Chem. Eur. J. 2008, 14, 2542–2547. (18) Podeszwa, R.; Rice, B. M.; Szalewicz, K. Crystal Structure Prediction for Cyclotrimethylene Trinitramine (RDX) From First Principles. Phys. Chem. Chem. Phys. 2009, 11, 5512– 5518. (19) Beran, G. J. O.; Nanda, K. Predicting Organic Crystal Lattice Energies with Chemical Accuracy. J. Chem. Phys. Lett. 2010, 1, 3480–3487. (20) Huang, Y.; Shao, Y.; Beran, G. J. O. Accelerating MP2C dispersion corrections for dimers and molecular crystals. J. Chem. Phys. 2013, 138, 224112. (21) Beran, G. J. O.; Hartman, J. D.; Heit, Y. N. Predicting Molecular Crystal Properties from First Principles: FiniteTemperature Thermochemistry to NMR Crystallography. Acc. Chem. Res. 2016, 49, 2501–2508. (22) Pejov, L.; Panda, M. K.; Moriwaki, T.; Naumov, P. Probing Structural Perturbation in a Bent Molecular Crystal with Synchrotron Infrared Microspectroscopy and Periodic Density Functional Theory Calculations. J. Am. Chem. Soc. 2017, 139, 2318–2328. (23) Xie, Y.; Ge, Y.; Peng, Q.; Li, C.; Li, Q.; Li, Z. How the Molecular Packing Affects the Room Temperature Phosphorescence in Pure Organic Compounds: Ingenious Molecular Design, Detailed Crystal Analysis, and Rational Theoretical Calculations. Adv. Mat. 2017, 29, 1606829–n/a. (24) Hoja, J.; Reilly, A. M.; Tkatchenko, A. First-principles mod-

(27)

(28) (29)

(30)

(31) (32)

(33)

(34)

(35)

(36)

(37) (38)

(39) (40) (41)

(42)

(43)

(44) (45)

(46) (47) (48) (49)

(50)

eling of molecular crystals: structures and stabilities, temperature and pressure. WIRES: Comput. Mol. Sci. 2017, 7, e1294–n/a. Jeziorski, B.; Moszynski, R.; Szalewicz, K. Perturbation Theory Approach to Intermolecular Potential Energy Surfaces of van der Waals Complexes. Chem. Rev. 1994, 94, 1887–1930. Szalewicz, K. Symmetry-adapted perturbation theory of intermolecular forces. WIREs Comput. Mol. Sci. 2012, 2, 254–272. Jeziorski, B.; Moszynski, R.; Ratkiewicz, A.; Rybak, S.; Szalewicz, K.; Williams, H. L. In Methods and Techniques in Computational Chemistry: METECC94 ; Clementi, E., Ed.; STEF: Cagliari, 1993; Vol. B (Medium-Size Systems); p 79. Kitaura, K.; Morokuma, K. New Energy Decomposition Scheme for Molecular-Interactions Within Hartree-Fock Approximation. Int. J. Quantum Chem. 1976, 10, 325–340. Bagus, P. S.; Hermann, K.; Bauschlicher, C. W. A New Analysis of Charge Transfer and Polarization for Ligand-Metal Bonding: Model Studies of Al4 CO and Al4 NH3 . J. Chem. Phys. 1984, 80, 4378–4386. Azar, R. J.; Head-Gordon, M. An energy decomposition analysis for intermolecular interactions from an absolutely localized molecular orbital reference at the coupled-cluster singles and doubles level. J. Chem. Phys. 2012, 136, 024103. Williams, H. L.; Chabalowski, C. F. Using Kohn-Sham Orbitals in Symmetry-Adapted Perturbation Theory to Investigate Intermolecular Interactions. J. Phys. Chem. A 2001, 106, 646. Heßelmann, A.; Jansen, G. The helium dimer potential from a combined density functional theory and symmetry-adapted perturbation theory approach using an exact exchangecorrelation potential. Phys. Chem. Chem. Phys. 2003, 5, 5010. Misquitta, A. J.; Podeszwa, R.; Jeziorski, B.; Szalewicz, K. Intermolecular potentials based on symmetry-adapted perturbation theory with dispersion energies from time-dependent density-functional calculations. J. Chem. Phys. 2005, 123, 214103. Heßelmann, A.; Jansen, G.; Sch¨ utz, M. Density-functional Theory-symmetry-adapted Intermolecular Perturbation Theory with Density Fitting: A New Efficient Method to Study Intermolecular Interaction Energies. J. Chem. Phys. 2005, 122, 014103. Podeszwa, R.; Cencek, W.; Szalewicz, K. Efficient Calculations of Dispersion Energies for Nanoscale Systems from Coupled Density Response Functions. J. Chem. Theory Comput. 2012, 8, 1963–1969. Rybak, S.; Jeziorski, B.; Szalewicz, K. Many-body symmetryadapted perturbation theory of intermolecular interactions. H[sub 2]O and HF dimers. J. Chem. Phys. 1991, 95, 6576– 6601. Patkowski, K.; Szalewicz, K.; Jeziorski, B. Third-order interactions in symmetry-adapted perturbation theory. J. Chem. Phys. 2006, 125, 154107. Williams, H. L.; Szalewicz, K.; Moszynski, R.; Jeziorski, B. Dispersion energy in the coupled pair approximation with noniterative inclusion of single and triple excitations. J. Chem. Phys. 1995, 103, 4586–4599. Korona, T.; Jeziorski, B. Dispersion energy from density-fitted density susceptibilities of singles and doubles coupled cluster theory. J. Chem. Phys. 2008, 128, 144107. Korona, T. First-order exchange energy of intermolecular interactions from coupled cluster density matrices and their cumulants. J. Chem. Phys. 2008, 128, 224104. Korona, T. Second-order exchange-induction energy of intermolecular interactions from coupled cluster density matrices and their cumulants. Phys. Chem. Chem. Phys. 2008, 10, 6509–6519. Korona, T. Exchange-Dispersion Energy: A Formulation in Terms of Monomer Properties and Coupled Cluster Treatment of Intramonomer Correlation. J. Chem. Theory Comput. 2009, 5, 2663–2678. Korona, T. A coupled cluster treatment of intramonomer electron correlation within symmetry-adapted perturbation theory: benchmark calculations and a comparison with a densityfunctional theory description. Mol. Phys. 2013, 111, 3705– 3715. Whitten, J. L. Coulombic Potential-Energy Integrals and Approximations. J. Chem. Phys. 1973, 58, 4496–4501. Dunlap, B. I.; Connolly, J. W. D.; Sabin, J. R. Applicability of LCAO-X-alpha Methods to Molecules Containing Transitionmetal Atoms - Nickel Atom and Nickel Hydride. Int. J. Quantum Chem. Symp. 1977, 12, 81. Dunlap, B. I.; Connolly, J. W. D.; Sabin, J. R. On some approximations in applications of X alpha theory. J. Chem. Phys. 1979, 71, 3396–3402. Vahtras, O.; Alml¨ of, J.; Feyereisen, M. W. Integral Approximations for LCAO-SCF Calculations. Chem. Phys. Lett. 1993, 213, 514–518. Feyereisen, M.; Fitzgerald, G.; Komornicki, A. Use of Approximate Integrals in Ab Initio Theory. An Application in MP2 Calculations. Chem. Phys. Lett. 1993, 208, 359–363. Rendell, A. P.; Lee, T. J. Coupled-cluster Theory Employing Approximate Integrals: An Approach to Avoid the Input/Output and Storage Bottlenecks. J. Chem. Phys. 1994, 101, 400–408. Kendall, R. A.; Fruchtl, H. A. The Impact of the Resolution

ACS Paragon Plus Environment 10

Page 10 of 22

Page 11 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(51) (52)

(53)

(54) (55) (56)

(57)

(58) (59)

(60)

(61)

(62)

(63) (64) (65) (66) (67) (68) (69) (70)

(71)

(72) (73) (74) (75)

(76)

Journal of Chemical Theory and Computation of the Identity Approximate Integral Method On Modern Ab Initio Algorithm Development. Theor. Chem. Acc. 1997, 97, 158–163. Weigend, F. A Fully Direct RI-HF Algorithm: Implementation, Optimized Auxiliary Basis Sets, Demonstration of Accuracy and Efficiency. Phys. Chem. Chem. Phys. 2002, 4, 4285–4291. Hohenstein, E. G.; Sherrill, C. D. Density fitting and Cholesky decomposition approximations in symmetry-adapted perturbation theory: Implementation and application to probe the nature of pi-pi interactions in linear acenes. J. Chem. Phys. 2010, 132, 184111. Hohenstein, E. G.; Parrish, R. M.; Sherrill, C. D.; Turney, J. M.; Schaefer, H. F. Large-scale symmetry-adapted perturbation theory computations via density fitting and Laplace transformation techniques: Investigating the fundamental forces of DNA-intercalator interactions. J. Chem. Phys. 2011, 135, 174107. Heßelmann, A.; Korona, T. Intermolecular symmetry-adapted perturbation theory study of large organic complexes. J. Chem. Phys. 2014, 141, 094107. Hohenstein, E. G.; Sherrill, C. D. Density fitting of intramonomer correlation effects in symmetry-adapted perturbation theory. J. Chem. Phys. 2010, 133, 014101. Hohenstein, E. G.; Jaeger, H. M.; Carrell, E. J.; Tschumper, G. S.; Sherrill, C. D. Accurate Interaction Energies for Problematic Dispersion-Bound Complexes: Homogeneous Dimers of NCCN, P2, and PCCP. J. Chem. Theory Comput. 2011, 7, 2842–2851. Parrish, R. M.; Hohenstein, E. G.; Sherrill, C. D. Tractability gains in symmetry-adapted perturbation theory including coupled double excitations: CCD+ST(CCD) dispersion with natural orbital truncations. J. Chem. Phys. 2013, 139, 174102. Maurer, S. A.; Beer, M.; Lambrecht, D. S.; Ochsenfeld, C. Linear-scaling symmetry-adapted perturbation theory with scaled dispersion. J. Chem. Phys. 2013, 139, 184104. Papajak, E.; Zheng, J.; Xu, X.; Leverentz, H. R.; Truhlar, D. G. Perspectives on Basis Sets Beautiful: Seasonal Plantings of Diffuse Basis Functions. J. Chem. Theory Comput. 2011, 7, 3027–3034. Parker, T. M.; Burns, L. A.; Parrish, R. M.; Ryno, A. G.; Sherrill, C. D. Levels of symmetry adapted perturbation theory (SAPT). I. Efficiency and performance for interaction energies. J. Chem. Phys. 2014, 140, 094106. Parrish, R. M.; Sherrill, C. D. Spatial assignment of symmetry adapted perturbation theory interaction energy components: The atomic SAPT partition. J. Chem. Phys. 2014, 141, 044115. Parrish, R. M.; Parker, T. M.; Sherrill, C. D. Chemical Assignment of Symmetry-Adapted Perturbation Theory Interaction Energy Components: The Functional-Group SAPT Partition. J. Chem. Theory Comput. 2014, 10, 4417–4431. Lillestolen, T. C.; Wheatley, R. J. Redefining the atom: atomic charge densities produced by an iterative stockholder approach. Chem. Commun. 2008, 2008, 5909–5911. Lillestolen, T. C.; Wheatley, R. J. Atomic charge densities generated using an iterative stockholder procedure. J. Chem. Phys. 2009, 131, 144101. Wheatley, R. J.; Gopal, A. A. Covalent bond orders and atomic anisotropies from iterated stockholder atoms. Phys. Chem. Chem. Phys. 2012, 14, 2087–2091. Hirshfeld, F. L. Bonded-atom Fragments for Describing Molecular Charge Densities. Theor. Chim. Acta 1977, 44, 129–138. Bultinck, P.; Van Alsenoy, C.; Ayers, P. W.; Carb´ o-Dorca, R. Critical analysis and extension of the Hirshfeld atoms in molecules. J. Chem. Phys. 2007, 126, 144111. Knizia, G. Intrinsic Atomic Orbitals: An Unbiased Bridge between Quantum Theory and Chemical Concepts. J. Chem. Theory Comput. 2013, 9, 4834–4843. Lehtola, S.; J´ onsson, H. Pipek–Mezey Orbital Localization Using Various Partial Charge Estimates. J. Chem. Theory Comput. 2014, 10, 642–649. Pipek, J.; Mezey, P. G. A Fast Intrinsic Localization Procedure Applicable for Ab Initio and Semiempirical Linear Combination of Atomic Orbital Wave-Functions. J. Chem. Phys. 1989, 90, 4916–4926. Parrish, R. M.; Sherrill, C. D. Quantum-Mechanical Evaluation of versus Substituent Interactions in Stacking: Direct Evidence for the WheelerHouk Picture. J. Am. Chem. Soc. 2014, 136, 17386–17389. Wheeler, S. E. Understanding Substituent Effects in Noncovalent Interactions Involving Aromatic Rings. Acc. Chem. Res. 2013, 46, 1029–1038. Parrish, R. M.; Sitkoff, D. F.; Cheney, D. L.; Sherrill, C. D. The Surprising Importance of Peptide Bond Contacts in DrugProtein Interactions. Chem. Eur. J. 2017, 23, 7887–7890. Turney, J. M. et al. PSI4: An Open-Source Ab Initio Electronic Structure Program. WIREs Comput. Mol. Sci. 2012, 2, 556– 565. Parrish, R. M. et al. Psi4 1.1: An Open-Source Electronic Structure Program Emphasizing Automation, Advanced Libraries, and Interoperability. J. Chem. Theory Comput. 2017, 13, 3185–3197. Alikhani, M.; Fuster, F.; Silvi, B. What Can Tell the Topological Analysis of ELF on Hydrogen Bonding? Struct. Chem.

2005, 16, 203–210. (77) Contreras-Garca, J.; Recio, J. Electron delocalization and bond formation under the ELF framework. Theor. Chem. Acc. 2011, 128, 411–418. (78) Johnson, E. R.; Keinan, S.; Mori-Snchez, P.; ContrerasGarca, J.; Cohen, A. J.; Yang, W. Revealing Noncovalent Interactions. J. Am. Chem. Soc. 2010, 132, 6498–6506. (79) Contreras-Garcia, J.; Johnson, E. R.; Keinan, S.; Chaudret, R.; Piquemal, J.-P.; Beratan, D. N.; Yang, W. NCIPLOT: A Program for Plotting Noncovalent Interaction Regions. J. Chem. Theory Comput. 2011, 7, 625–632. (80) de Silva, P.; Corminboeuf, C. Simultaneous Visualization of Covalent and Noncovalent Interactions Using Regions of Density Overlap. J. Chem. Theory Comput. 2014, 10, 3745–3756. (81) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. The Journal of Chemical Physics 2010, 132, 154104. (82) Grimme, S.; Ehrlich, S.; Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. Journal of Computational Chemistry 2011, 32, 1456–1465. (83) Podeszwa, R.; Pernal, K.; Patkowski, K.; Szalewicz, K. Extension of the HartreeFock Plus Dispersion Method by First-Order Correlation Effects. J. Phys. Chem. Lett. (84) Hesselmann, A. Comparison of Intermolecular Interaction Energies from SAPT and DFT Including Empirical Dispersion Contributions. J. Phys. Chem. A 2011, 115, 11321–11330. (85) Lao, K. U.; Herbert, J. M. Accurate Intermolecular Interactions at Dramatically Reduced Cost: XPol+SAPT with Empirical Dispersion. J. Phys. Chem. Lett. 2012, 3, 3241–3248. (86) Ufimtsev, I. S.; Mart´ınez, T. J. Quantum chemistry on graphical processing units. 1. Strategies for two-electron integral evaluation. J. Chem. Theory Comput. 2008, 4, 222–231. (87) Ufimtsev, I. S.; Mart´ınez, T. J. Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation. J. Chem. Theory Comput. 2009, 5, 1004–1015. (88) Ufimtsev, I. S.; Mart´ınez, T. J. Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics. J. Chem. Theory Comput. 2009, 5, 2619–2628. (89) Luehr, N.; Ufimtsev, I. S.; Mart´ınez, T. J. Dynamic Precision for Electron Repulsion Integral Evaluation on Graphical Processing Units (GPUs). J. Chem. Theory Comput. 2011, 7, 949–954. (90) Titov, A. V.; Ufimtsev, I. S.; Luehr, N.; Mart´ınez, T. J. Generating Efficient Quantum Chemistry Codes for Novel Architectures. J. Chem. Theory Comput. 2013, 9, 213–221. (91) Bista, M.; Wolf, S.; Khoury, K.; Kowalska, K.; Huang, Y.; Wrona, E.; Arciniega, M.; Popowicz, G. M.; Holak, T. A.; D¨ omling, A. Transient Protein States in Designing Inhibitors of the MDM2-p53 Interaction. Structure 2013, 21, 2143–2151. (92) Chen, Z.; Li, Y.; Chen, E.; Hall, D. L.; Darke, P. L.; Culberson, C.; Shafer, J. A.; Kuo, L. C. Crystal structure at 1.9-A resolution of human immunodeficiency virus (HIV) II protease complexed with L-735,524, an orally bioavailable inhibitor of the HIV proteases. J. Bio. Chem. 1994, 269, 26344–26348. (93) Alml¨ of, J.; Faegri, K.; Korsell, K. Principles for a direct SCF approach to LCAO–MO ab-initio calculations. J. Comp. Chem. 1982, 3, 385–399. (94) Alml¨ of, J. Elimination of Energy Denominators in MøllerPlesset Perturbation Theory by a Laplace Transform Approach. Chem. Phys. Lett. 1991, 181, 319–320. (95) H¨ aser, M.; Alml¨ of, J. Laplace Transform Techniques in MøllerPlesset Perturbation Theory. J. Chem. Phys. 1992, 96, 489– 494. (96) Braess, D.; Hackbusch, W. Approximation of 1/x by exponential sums in [1, ∞). IMA J. Numer. Anal. October 2005, 25, 685–697. (97) Takatsuka, A.; Ten-no, S.; Hackbusch, W. Minimax approximation for the decomposition of energy denominators in Laplace-transformed M[o-slash]ller–Plesset perturbation theories. J. Chem. Phys. 2008, 129, 044112. (98) Hohenstein, E. G.; Parrish, R. M.; Mart´ınez, T. J. Tensor hypercontraction density fitting. I. Quartic scaling secondand third-order M[o-slash]ller-Plesset perturbation theory. J. Chem. Phys. 2012, 137, 044103. (99) Parrish, R. M.; Hohenstein, E. G.; Mart´ınez, T. J.; Sherrill, C. D. Tensor hypercontraction. II. Least-squares renormalization. J. Chem. Phys. 2012, 137, 224106. (100) Parrish, R. M.; Hohenstein, E. G.; Schunck, N. F.; Sherrill, C. D.; Mart´ınez, T. J. Exact Tensor Hypercontraction: A Universal Technique for the Resolution of Matrix Elements of Local Finite-Range N -Body Potentials in Many-Body Quantum Problems. Phys. Rev. Lett. 2013, 111, 132505. (101) Riley, K. E.; Hobza, P. Assessment of the MP2 Method, along with Several Basis Sets, for the Computation of Interaction Energies of Biologically Relevant Hydrogen Bonded and Dispersion Bound Complexes. J. Phys. Chem. A 2007, 111, 8257– 8263. (102) Pulay, P. Convergence acceleration of iterative sequences. the case of scf iteration. Chem. Phys. Lett. 1980, 73, 393 – 398. (103) Isborn, C. M.; Goetz, A. W.; Clark, M. A.; Walker, R. C.; Martinez, T. J. Electronic Absorption Spectra from MM and Ab Initio QM/MM Molecular Dynamics: Environmental Effects

ACS Paragon Plus Environment 11

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(104) (105)

(106)

(107)

(108)

(109)

on the Absorption Spectrum of Photoactive Yellow Protein. J. Chem. Theory Comput. 2012, 8, 5092–5106. Liao, R. Z.; Thiel, W. Convergence in the QM-Only and QM/MM Modeling of Enzymatic Reactions: A Case Study for Acetylene Hydratase. J. Comput. Chem. 2013, 34, 2389–2397. Sadeghian, K.; Flaig, D.; Blank, I. D.; Schneider, S.; Strasser, R.; Stathis, D.; Winnacker, M.; Carell, T.; Ochsenfeld, C. Ribose-Protonated DNA Base Excision Repair: A Combined Theoretical and Experimental Study. Anfew. Chem., Int. Ed. 2014, 53, 10044–10048. Kulik, H. J.; Zhang, J.; Klinman, J. P.; Martinez, T. J. How Large Should the QM Region be in QM/MM Calculations? The Case of Catechol O-Methyltransferase. J. Phys. Chem. B 2016, 120, 11381–11394. Flaig, D.; Beer, M.; Ochsenfeld, C. Convergence of Electronic Structure with the Size of the QM Region: Example of QM/MM NMR Shieldings. J. Chem. Theory Comput. 2012, 8, 2260–2271. Hartman, J. D.; Neubauer, T. J.; Caulkins, B. G.; Mueller, L. J.; Beran, G. J. Converging Nuclear Magnetic Shielding Calculations with Respect to Basis and System Size in Protein Systems. J. Biomol. NMR 2015, 62, 327–340. Harris, T. V.; Szilagyi, R. K. Protein Environmental Effects on Iron-Sulfur Clusters: A Set of Rules for Constructing Computational Models for Inner and Outer Coordination Spheres. J. Comput. Chem. 2016, 37, 1681–1696.

ACS Paragon Plus Environment 12

Page 12 of 22

Page 13 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 1. Schematic of the drug-protein interactions and F-SAPT functional group fragmentation patterns considered in this work. (A) 4MDQ, with 2× 28W ligands bound to the 86-residue MDM2 protein. (B) 1HSG, with 1× MK1 indinavir ligand bound to the 198-residue HIV-II protease. In (A) and (B) protein is shown as ribbons and ligands are shown in ball and stick representation. Protein side chains and water molecules are omitted for clarity. (C) F-SAPT fragmentation scheme for 28W. (D) F-SAPT fragmentation scheme for MK1. (E) F-SAPT fragmentation scheme for a general polypeptide chain. See text for treatment of proline and disulfide units. Table 1. Wall Times (in hours) for F-SAPT/6-31G** for 4MDQ (run on a node with 8× NVIDIA GTX 970 GPUs, 2× Intel Xeon E5-2643 quad-core CPUs @ 3.30 GHz, and 400 GB RAM) and F-SAPT/6-31G* for 1HSG (run on a node with 6× NVIDIA GTX 1080 Ti GPUs, 2× Intel Xeon E5-2643v4 quad-core CPUs @ 3.40 GHz, and 768 GB RAM). For 4MDQ, the left column shows the timings for the case where the ligand/protein are treated as monomers A/B, respectively (the optimized case). The right column shows the timings for the case where the ligand/protein are treated as monomers B/A, respectively.

Term Dimer RHF Ligand RHF Protein RHF Ligand CPHF Protein CPHF Order-0 F-SAPT Fragmentation Order-1 F-SAPT Order-2 F-SAPT DFT-D3 Total

4MDQ(A@B)/6-31G** 4.56 2.98 3.64 0.75 0.96 2.36 0.99 0.25 3.92 0.01 20.44

4MDQ(B@A)/6-31G** 4.56 2.98 3.64 0.75 0.96 2.37 1.03 0.25 64.33 0.01 80.88

ACS Paragon Plus Environment 13

1HSG/6-31G* 8.50 5.16 8.50 1.29 1.51 4.25 5.26 0.49 4.24 0.02 39.22

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 22

Figure 2. Order-1 F-SAPT/6-31G** plots for 4MDQ interaction energy contributions. All atoms in each fragment are colored according to the order-1 contribution for the full fragment. The colormap saturates at ±5 kcal mol−1 .

ACS Paragon Plus Environment 14

Page 15 of 22

All Residues

101 100

|∆E|

[kcal mol−1 ]

102

10-1 5

10

15

100 10-1 5

10

15

20 Peptide Links

5

10

15

20 Waters

5

10

15

20

101 100

|∆E|

[kcal mol−1 ]

10-2 102

20 Side Chains

Elst Exch IndAB IndBA Disp SAPT

101

|∆E|

[kcal mol−1 ]

10-2 102

10-1

[kcal mol−1 ]

10-2 102 101 100

|∆E|

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

10-1 10-2

R[ ]

Figure 3. Absolute residual vs. full 4MDQ interaction energy at F-SAPT/6-31G**, with fragments B sorted by closest contact R.

ACS Paragon Plus Environment 15

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 22

Figure 4. Order-1 difference F-SAPT/6-31G** plots for 4MDQ upon substitution with Cl vs. Me. Negative (red) indicates a preference for Cl, positive (blue) indicates a preference for Me. All atoms in each fragment are colored according to the order-1 contribution for the full fragment. The colormap saturates at differences of ±5 kcal mol−1 .

ACS Paragon Plus Environment 16

Page 17 of 22

All Residues

100 10-1

|∆E|

[kcal mol−1 ]

101

5

10

15

10-1 5

10

15

20 Peptide Links

5

10

15

20 Waters

5

10

15

20

100 10-1

|∆E|

[kcal mol−1 ]

10-2 101

20 Side Chains

Elst Exch IndAB IndBA Disp SAPT

100

|∆E|

[kcal mol−1 ]

10-2 101

[kcal mol−1 ]

10-2 101 100 10-1

|∆E|

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

10-2

R[ ]

Figure 5. Absolute cumulative errors vs. full 4MDQ difference interaction energy at F-SAPT/6-31G**, with fragments B sorted by closest contact R.

ACS Paragon Plus Environment 17

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 22

Figure 6. Order-2 difference F-SAPT plots for 4MDQ difference interaction energy contributions. All atoms in each fragment are colored according to the order-2 contribution for the full fragment. The colormap saturates at ±5 kcal mol−1 . Terms/ligand fragments not shown have visually negligible contributions to the difference interaction energy.

ACS Paragon Plus Environment 18

Page 19 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 7. Order-1 F-SAPT/6-31G* plots for 1HSG interaction energy contributions. All atoms in each fragment are colored according to the order-1 contribution for the full fragment. The colormap saturates at ±5 kcal mol−1 .

ACS Paragon Plus Environment 19

Journal of Chemical Theory and Computation

All Residues

101 100

|∆E|

[kcal mol−1 ]

102

10-1 5

10

15

20

100 10-1 5

10

15

20

5

10

15

20

5

10

15 R[ ]

20

25 Peptide Links

101 100

|∆E|

[kcal mol−1 ]

10-2 102

25 Side Chains

Elst Exch IndAB IndBA Disp SAPT

101

|∆E|

[kcal mol−1 ]

10-2 102

10-1

[kcal mol−1 ]

10-2 102

Waters

25

101 100

|∆E|

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

10-1 10-2

25

Figure 8. Absolute cumulative errors vs. full 1HSG interaction energy at F-SAPT/6-31G*, with fragments B sorted by closest contact R.

ACS Paragon Plus Environment 20

Page 20 of 22

Page 21 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 9. Order-2 F-SAPT/6-31G* plots for 1HSG interaction energy contributions. All atoms in each fragment are colored according to the order-2 contribution for the full fragment. The colormap saturates at ±5 kcal mol−1 .

ACS Paragon Plus Environment 21

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

F-SAPT +

Figure 10.

TOC Figure. For TOC use only.

ACS Paragon Plus Environment 22

Page 22 of 22