A Fragment Quantum Mechanical Method for Metalloproteins - Journal

Jan 8, 2019 - Accurate energy calculation of metalloprotein is of crucial importance and also a theoretical challenge. In this work, a metal molecular...
1 downloads 0 Views 1MB Size
Subscriber access provided by University of Winnipeg Library

Biomolecular Systems

A Fragment Quantum Mechanical Method for Metalloproteins Mingyuan Xu, Xiao He, Tong Zhu, and John Z.H. Zhang J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00966 • Publication Date (Web): 08 Jan 2019 Downloaded from http://pubs.acs.org on January 11, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

A Fragment Quantum Mechanical Method for Metalloproteins Mingyuan Xua, Xiao Hea,b, Tong Zhua,b* and John Z. H. Zhanga,b,c* aShanghai

Engineering Research Center of Molecular Therapeutics & New Drug Development, School of

Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China bNYU-ECNU

Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China

cDepartment

of Chemistry, New York University, New York 10003, United States

Email: [email protected], [email protected]

Abstract Accurate energy calculation of metalloprotein is of crucial importance and also a theoretical challenge. In this work, a metal molecular fractionation with conjugate caps (Metal-MFCC) approach is developed for efficient linear-scaling quantum calculation of potential energy and atomic forces of metalloprotein. In this approach, the potential energy of a given protein is calculated by linear combination of potential energies of the neighboring residues, two-body interaction energy between non-neighboring residues that are spatially in close contact and the potential energy of the metal binding group. The calculation of each fragment is embedded in a field of point charges representing the remaining protein environment. Numerical studies were carried out to check the performance of this method, and the calculated potential energies and atomic forces all show excellent agreement with the full system calculations at the M06-2X/6-31G(d) level. By combining the energy calculation with molecular dynamic simulation, we performed an ab initio structural optimization for a zinc finger protein with high efficiency. The present Metal-MFCC approach is linearscaling with a low pre-factor and trivially parallelizable. The individual fragment typically contains about 50 atoms, and it is thus possible to be calculated at higher levels of quantum chemistry method. This fragment method can be routinely applied to perform structural optimization and ab initio molecular dynamic simulation for metalloproteins of any size.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 21

1. Introduction It is estimated that nearly half of all known proteins contain metal ions, and they play critical roles from protein structure stabilization to enzyme catalysis, signal transduction, nitrogen fixation, photosynthesis and respiration1, 2. Thus, understanding the molecular basis of metal-protein interaction is of fundamental importance3-5. In the past decades, many researchers have devoted themselves to the study of metal-protein interactions. Compared to experimental methods, theoretical modeling can simulate dynamical properties of metalloproteins, offer atomic details at various levels and help us better interpret chemical phenomena in metalloproteins and provide new molecular insight. At present, there are two fundamental strategies for modeling metalloprotein systems: classical molecular mechanics (MM) and quantum mechanics (QM) methods. The MM method has a significant advantage in terms of computational efficiency. However, the accuracy of MM method highly depends on the pre-defined molecular force fields. Unfortunately, most exiting force fields are generally incapable of properly describing the interactions between metal ions and proteins, and this has already become a practical obstacle for molecular modeling studies of metalloproteins. In their recent work6, Friedman and co-workers calculated the interaction energies between Zn2+ and its ligands in several representative complexes using both QM and MM methods. It was found that MM calculations that neglect or only approximate polarization could not reproduce even the relative order of the QM interaction energies of these complexes. In fact, numerous studies have suggested that, to accurately investigate metal-protein interactions, it is essential to include both polarization and charge transfer (CT) effects in the force field. Recently, several force fields for metalloproteins were introduced, such as the SIBFA model of Gresh et al.7, 8, the CTPOL model of Lim et al.9, 10, the SLEF model of Wu et al11, the AMOEBA model of Ren et al.12, 13, the 12-6-4 LJ-type non-bonded model of Li and Merz14, the ABEEM of Yang et al.15, the Drude oscillator model of Roux et al.16, a new CT model of Rick et al.17, and the QPCT model for zinc proteins in our previous work18. In these force fields, the polarization and charge transfer effects were included to some degrees, so the calculated results are clearly improved. A comprehensive review of the modeling of metalloproteins using classical mechanics can be found in a recent review of Li and Merz19. Although the above-mentioned models could give better results than traditional force fields, the improvement is not always guaranteed but is highly dependent on the quality of the parameters. In fact, the complicated parametrization procedure is also a drawback of these models. With the development of new quantum chemical methods and rapidly advancing computer technology, one has seen increasing applications of QM methods in the study of metalloproteins20-27. Compared with the MM model, QM calculation could provide more accurate potential energy and structure for metalloproteins. However, the application of QM calculation in macromolecules such as proteins is significantly limited by its computational cost, which increases rapidly with the size of the system being studied. To extend the applicability of rigorous QM methods to large systems, considerable efforts have been made to developing linear-scaling and/or fragmentation methods28-49. The fragmentation approach based on the chemical locality of molecular systems, which assumes that a local region of a given system is only weakly influenced by atoms that are far away from it. In these fragmentation methods, a given macromolecular system is divided ACS Paragon Plus Environment

Page 3 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

into subsystems whose properties are calculated separately on QM level, and finally the property of the whole system can be obtained by taking proper linear combination of the properties of individual fragments. The fragmentation method has several advantages, such as easy parallelization and no need to modify the existing QM programs. More importantly, as the size of each fragment is relatively small, the fragmentbased methods can be applied at higher levels of ab initio electronic structure theories. A range of fragmentation approaches for the QM calculation of large molecular systems has been proposed in recent years, a comprehensive review of these methods can be found in two review articles, written by Gordon50 and Collins51. In our previous work32, an electrostatically embedded generalized molecular fractionation with conjugate caps (EE-GMFCC) method was developed for accurate calculation of protein energy. Systematic tests have demonstrated the accuracy and high efficiency of the EE-GMFCC method at different levels of theory, as compared to the conventional full-system QM calculations for proteins. Furthermore, this approach had also been successfully utilized for the ab initio molecular dynamic (AIMD) simulation of proteins in explicit solvent31. In this work, we present a Metal-MFCC (metal molecular fractionation with conjugate caps) method for accurate calculation of the potential energy and atomic forces of metalloproteins. This paper is organized as follows. Section 2 provides a description of the Metal-MFCC approach. In section 3, we performed benchmark tests for a wide range of metalloprotein systems containing different metal ions, and compare the calculated results with those from the corresponding conventional full-system QM calculations. We also extended the Metal-MFCC method to a short ab initio structural optimization of a zinc finger protein. Finally, a brief summary is given in section 4.

2. Theoretical Methods: The Metal-MFCC method mainly has two features. Firstly, a given protein is divided into subsystems (fragments) which are calculated separately by QM method, and finally the energy/atomic forces of the whole system are obtained by the combination of these properties of individual fragments. Secondly, several types of conjugate caps are introduced to saturate the dangling bonds of the fragments and preserve their local chemical environment. The double counting caused by these conjugate caps are subsequently removed. In the current version of Metal-MFCC, we focus on most common metalloproteins that contain mononuclear metal centers. A given metalloprotein can be decomposed into two parts: the metal binding group (MBG) and the remaining part of the protein (P ′ , Fig. 1(a)). To avoid dangling bonds, a set of “metal conjugate molecular caps” (Mconcap, including Mcap* and Mcap) are designed to saturate the cutoff groups. The calculation procedure of Metal-MFCC is divided into three steps. Firstly, the energy of the capped metal ion binding group (Mcaps*-MBG) is calculated under the background charges of P ′ as shown in Fig.1(b). Secondly, the capped P′ (P′-Mcaps, Fig. 1(c)) is cut into small molecular fragments, and the energy of each fragment is calculated under the background charge of the metal ion. Finally, the double counting due to conjugate caps is removed and the potential energy of the entire system is obtained from the calculated energies of all fragments. The definitions of Mcap*, Mcap and Mconcap can be found in Fig. 1. The final ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 21

energy of the system is expressed as: ~

~

E Metal-MFCC = E(Mcaps * -MBG) + E(P ' - Mcaps) - E Mconcap DC

(1)

Figure 1. (a) In Metal-MFCC, a given metalloprotein is divided into two parts: metal binding group (MBG, included in the blue dashed circle) and the remaining part of the protein (P′). (b) The atomic structure of Mcaps*-MBG and four Mcaps* are shown in blue. (c) The atomic structure of P′-Mcaps and four Mcaps are shown in red. (d) The atomic structures of four pairs of Mconcap. Many studies have shown that the electrostatic polarization plays a critical role in metalloprotein systems10, 17, 18. Therefore, in the Metal-MFCC approach, QM calculation of each fragment is embedded in the electrostatic field of the point charges representing the remaining part of the system to account for the ~

environmental polarization effect. In Eq. (1), E( x) represents the energy of the fragment x and its interaction energy with the background charges representing the rest of the entire system: N MM ~ Z  (r )   E( x) = E( x) +  q k    dr x  Rk  r  k   R k  R

(2)

, where E(x) is the quantum self-energy of the subsystem x (with background effect), NMM is the number of background charges, qk and Rk is, respectively, the charge and coordinates of the kth background charge, Zα ACS Paragon Plus Environment

Page 5 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

and Rα are, respectively, the nuclear charge and coordinates of the αth nuclei in fragment x.  x (r ) represent ~

the electron density distribution of fragment x. E( x ) represents either the first or second term in Eq. (1) as well as their decomposition terms to be discussed in the following. The introduction of Mconcaps and electrostatic embedding will cause double counting which needs to be removed by the term ( E Mconcap ) in Eq. (1). Details of the calculations of these various energy terms are DC described below. ~

2.1 Calculation of E( Mcaps *  MBG ) In the Metal-MFCC approach, if the distance between any atoms in a residue that forms a coordination bond with the metal ion is 2.6Å or shorter, the side chain or main chain which contains the coordinated atom of this particular residue is treated as a member of the metal binding group (MBG). The cutoff distance 2.6Å was chosen because it covers metal-ligand bond distances in most common metalloproteins.52,

53

For

example, there are two histidine and two cysteine residues that coordinate to the zinc ion in a CCHH type zinc finger as shown in Fig. 1(a), then the side chains of these four residues and the zinc ions are all included in the MBG. To preserve the valence of the chemical bonds and mimic the local chemical environment of MBG and P′, the Mconcaps are used to cap them. Mconcap contains 2 parts: Mcap for P′ and Mcap* for MBG. As shown in Fig. 1(b-c), MBG is capped with four Mcaps* and P′ is capped with four corresponding Mcaps. For the studied zinc finger protein, Mcaps are the side chains belonging to the MBG and Mcaps* are simply several methyl groups whose positions are determined by the coordinates of the C α atoms of corresponding residues. Hydrogen atoms are added to avoid dangling bonds. ~

2.2 Calculation of E(P ' Mcaps) . As can be seen from Fig.1, the P′-Mcaps part is actually the pure protein after removing the metal ion. In the current Metal-MFCC method, a given P′-Mcaps part with N amino acids (defined as A1 A2 A3 ... AN ) is partitioned into N individual amino acid units by cutting through the peptide bonds as illustrated in Fig. 2(a). To preserve the local chemical environment of each separated amino acid, a pair of molecular conjugate caps (Cap and Cap*) is used to cap each fragment and hydrogen atoms are added to terminate the molecular caps to avoid dangling bonds (see Fig. 2)32.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 21

Figure 2. (a), The peptide bond is cut in the upper panel and the fragments is capped with Capi1 and its conjugate Cap*i in the middle panel (b), where i represents the index of the ith amino acid in the given protein. The atomic structure of the concap is shown in the bottom panel (c), defined as the fused molecular species

Cap*i  Capi1 .

Figure 3. Generalized concap (Gconcap) scheme in the upper panel and the atomic structure of the Gconcap in the bottom panel. The Gconcap in Metal-MFCC is defined as an H-CO-NH-CαHR-H structure instead of normal H-saturated residues to preserve the electron delocalization across the peptide bond.

ACS Paragon Plus Environment

Page 7 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

A generalized molecular concap (Gconcap30, as shown in Fig. 3) was introduced to account for the twobody QM interaction energies between short-range non-neighboring residues. If the distance of any two atoms between residue Ai and Aj is less than or equal to a distance threshold λ, the interaction between these two residues are described by quantum mechanics. Classical force field is introduced to describe the longrange interaction between distant non-neighboring fragments. Note that the definition of Gconcap in the present Metal-MFCC method is different from that in our previous study.30, 32 Here we use the H–CO–NH– CαHR–H structure as the Gconcap instead of the H-saturated residues in order to better describe the electron ~

delocalization across the peptide bond and improve the accuracy of force calculation. Thus E (P'-Mcaps) is given by the following formula, ~

~

~

~

E(P' - Mcaps) = E fragment  E concap + E two-body  E protein DC

(3)

, where various terms are given below. N-1 ~

~

E fragment =  E(Cap*i -1A i Capi 1 )

(4)

i =2

N-2 ~

~

E concap =  E(Cap*i Capi 1 )

(5)

i =2

~

E two-body =



i , j i  2, R i -R j  

E protein =  DC k ,l m , n

~ ~ ~  E( A i A j )  E(A i )  E(A j )  

q m( k ) q n(l ) R m( k )n(l )



(6)

q m'(i ) q n'( j )

R  m',n'

i , j  i  2, R i -R j  

(7)

m'( i )n'( j )

~

where i and j denote the ith and jth residues, respectively. The first term E fragment in Eq. (3) is the summation ~

of energies of all capped fragments as shown in Eq. (4). The second term E concap in Eq. (3) is the summation of energies of all concaps as shown in Eq. (5). The third term in Eq. (3) is the two-body interaction terms described above as illustrated in Eq. (6). These energy terms are all calculated with electrostatic embedding. The last term in Eq. (3) is the energy correction term that approximately cancels the double counting due to the use of concap and Gconcap by classical Coulomb interactions as shown in Eq. (7). The detailed definitions of k, l, and atomic charges qm(k) and qn(l) are given in ref.32, except that in Metal-MFCC, the metal ion is included in the definition of l. 2.3 Removal of double counting due to the use of Mconcaps To remove double counting due to the use of Mconcaps, we subtract the following energy term (the last term in Eq. (1)) from the total energy:

E

Mconcaps DC

=

N Mconcap ~

 i

E(Mconcapi )-

N Mconcap

    , 



m 'Mconcap n 'Mconcap 

q m 'q n ' R m 'n '

ACS Paragon Plus Environment

(8)

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 21

One should note that, if side chains of two neighboring residues were included in the same metal binding group, we simply treat these two side chains as one fragment in the MBG. 2.4 Calculation of atomic forces The force acting on an atom m are obtained by calculating the gradient of the energy EMetal-MFCC with respect to its nuclear coordinates,

Fm =  m E Metal-MFCC

(9)

2.5 Computation details All the QM calculations in this study were performed with the Gaussian 0954 software and the M06-2X functional. The 6-31G(d) basis set was used for the C, O, N, H and S elements. For comparison, both the 631G(d) basis set and the LANL2DZ pseudo potential were used for metal ions. To reduce the computational cost, the SCF convergence criterion in the QM calculation was set to 10-6 and the integration was set to “Fine grid”. The background charges used in the electrostatic embedding were taken from the Amber ff14SB force field55. A Linux cluster containing 20 computational nodes was used for this study, each node has 20 CPU cores (dual E5-2680v2, 2.80GHz). In the structural optimization, the convergence criterion used in the steepest decent gradient method is 1.0*10-4 kcal/(mol Å). Three small proteins were used to validate the definition of the Gconcap. The first one is a linear peptide ACE-(ALA)9-NME constructed by the Tleap module in the Amber16 software, the second and third are a designed α-helix and tryptophan zipper 2 downloaded from the protein data bank (PDB ID: 2I9M and 1LE1), respectively. Fifteen metalloproteins containing Zn2+, Cu2+ and Cu+ ions selected from the protein data bank (PDB ID: 1AT1, 1B8T, 1ZE9, 2ZNF, 2L44, 1A4A, 1AAC, 1J5D, 1ZE9, 1AZB, 1A8Z, 1B3I, 2K70, 4DP1 and 2LEL) were used to benchmark the Metal-MFCC method. The Amber16 software was used to pre-prepare all of these proteins before the study.

3. Results and Discussions 3.1 The performance of Metal-MFCC on pure proteins To validate the Metal-MFCC method, we firstly checked its performance on three pure proteins (nonmetalloproteins) and compared the results with that calculated by EE-GMFCC method32. As shown in Table 1, the popential energies calculated by both Metal-MFCC and EE-GMFCC agree very well with the conventional full-system QM calculations, the deviations are all less than 10kcal/mol. The calculated atomic forces are also compared and shown in Fig. 4. Table 1. Potential energies of three proteins obtained from conventional full-system QM calculations, EEGMFCC and Metal-MFCC. All calculations were performed at the M06-2X/6-31G(d) level. System

Protein type

ACE-ALA -NME Linear protein 9

Energy deviation from full-system calculation (kcal/mol) EE-GMFCC Metal-MFCC

Atom No.

Full-system (au.)

102

-2473.42236

8.53

8.53

2I9M

α-helix

237

-5942.57296

2.92

-4.47

1LE1

β-sheet

194

-4929.80169

-1.42

-1.29

ACS Paragon Plus Environment

Page 9 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 4. Correlation between atomic forces of three proteins (ACE-(ALA)9-NME linear peptide, 1LE1, 2I9M) calculated by full-system QM calculation and that calculated by Metal-MFCC (blue) and EE-GMFCC (red). All the QM calculations were performed at the M06-2X/6-31G(d) level. For the ACE-ALA9-NMR linear peptide, both Metal-MFCC and EE-GMFCC show excellent agreements with the full-system QM calculation. However, for the 2I9M (with α-helix secondary structure) and 1LE1 (with β-sheet secondary structure) systems, the performance of Metal-MFCC is obviously better than that of EE-GMFCC. The largest deviation of the calculated atomic forces between EE-GMFCC and full-system QM is -11.74kcal/(mol* Å), which corresponds to the amide N atom of LYS8 in 2I9M. Further analysis shows that the EE-GMFCC failed to calculate the force of atoms in the backbone that form hydrogen bonds. After careful checking, we found that the disagreement is mainly due to the improper definition of Gconcap. In EE-GMFCC, the Gconcap was defined as H-saturated residues32, which can’t preserve the electron delocalization across the peptide bond. The shortcomings of this definition will be more obvious when there is hydrogen bond formed between the main-chain carboxyl and amide group of two residues. To solve this problem, we modified the definition of Gconcap in Metal-MFCC (as illustrated in Fig. 3) to avoid cutting the peptide bond. The force deviation of the amide N atom in LYS8 between MetalMFCC and full-system QM calculation is only 3.95kcal/(mol* Å), which is much smaller than that of EEGMFCC. 3.2 The performance of Metal-MFCC on metalloproteins. After confirming the performance of Metal-MFCC on pure proteins, we checked whether it works well for metalloproteins. Varieties of proteins containing Zn2+, Cu2+ and Cu+ ions were chosen as their importance ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 21

in the biological system. Firstly, we double-checked the influence of the two-body cutoff threshold (λ) on the convergence of the calculated potential energy. Fig. 5 shows the calculated potential energies of six selected metalloproteins as a function of λ, potential energies calculated by full-system QM method are taken as reference. As can be seen, the potential energies of all systems are close to convergence at λ = 4.5 Å, thus this threshold value was adopted for all following calculations. Then we compared the potential energies calculated by Metal-MFCC with that calculated by full-system QM method for more metalloproteins. The results are shown in Table 2. As can be seen, the largest deviation of potential energy is less than 10 kcal/mol and the mean unsigned error is only 3.23 kcal/mol. Table 2. Potential energies of 15 metalloproteins calculated by Metal-MFCC and full-system QM method at the M06-2X/6-31G(d) level. Atom

Metal

1AT1

Zn2+

376

-12303.74464

-12303.73044

8.91

1B8T

Zn2+

427

-14037.46486

-14037.46929

-2.77

1ZE9

Zn2+

266

-8804.56829

-8804.57254

-2.67

2ZNF

Zn2+

284

-9722.08154

-9722.08482

-2.06

2L44

Zn2+

307

-10806.88538

-10806.88786

-1.56

1A4A

Cu2+

357

-11549.93159

-11549.93304

-0.91

1AAC

Cu2+

398

-12614.16955

-12614.17480

-3.29

1J5D

Cu2+

405

-12294.15976

-12294.15286

4.33

1ZE9

Cu2+

266

-8665.68119

-8665.68514

-2.47

1AZB

Cu2+

410

-12969.10544

-12969.10649

-0.66

1A8Z

Cu+

424

-13004.79564

-13004.78567

6.26

1B3I

Cu+

436

-13003.87887

-13003.87450

2.74

2K70

Cu+

541

-15813.12616

-15813.13730

-6.99

4DP1

Cu+

349

-11079.92941

-11079.92976

-0.21

2LEL

Cu+

518

-15921.89420

-15921.89002

2.62

No.

Efull-system(au.)

EMetal-MFCC

PDB ID

(λ=4.5Å, au.)

MUE

ΔE (kcal/mol)

3.23

ACS Paragon Plus Environment

Page 11 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 5. Relative potential energies (in kcal/mol) calculated by Metal-MFCC are shown as a function of the distance threshold of two-body interaction for several metalloproteins. Potential energies calculated by fullsystem QM method are taken as reference. Fig. 6 depicts the correlation of atomic forces calculated by full-system QM calculation and MetalMFCC. Just like results of pure proteins, excellent agreements can be seen. All correlation coefficients are greater than 0.99 and the largest RMSE value of these systems is 1.05kcal/(mol* Å) (Table 3).

Figure 6. Correlation between atomic forces calculated by full-system QM calculation and Metal-MFCC.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 21

Table 3. Comparison of atomic forces calculated by Metal-MFCC and full-system QM for six selected metalloproteins. The MAXE, MAE and RMSE denotes the max error, mean signed error and root mean squared error, respectively.

PDB ID

Metal Cations

Atom No.

MAXE (kcal/mol/Å)

MAE (kcal/mol/Å)

RMSE (kcal/mol/Å)

1B8T

Zn2+

427

5.29

-0.26

0.88

1ZE9

Zn2+

266

4.80

-0.12

0.81

1A4A

Cu2+

357

6.55

-0.20

1.05

1AAC

Cu2+

398

4.73

-0.24

0.86

1B3I

Cu+

436

6.87

-0.23

0.95

4DP1

Cu+

349

5.76

-0.24

0.85

5.67

0.21

0.90

MUE

The computational cost of Metal-MFCC and full-system QM calculation are also compared and the results are shown in Table 4. For studied systems containing Zn2+, the advantage of Metal-MFCC in computational cost is not very obvious. The time needed to calculate the largest system 1B8T with fullsystem QM is 0.72h, which is 1.43 times slow than that of Metal-MFCC. For studied systems containing copper, the advantage of Metal-MFCC in computational cost is more obvious, which maybe because that the interaction between copper and protein is more complicit than that of zinc, thus the QM calculation is more difficult to converge. For example, the full-system QM calculation for protein 1A4A needs 44 SCF iterations to achieve convergence. While in the Metal-MFCC calculation for the same system, 36 cycles of SCF iterations are needed for the largest fragment that contains the metal-binding group and 15 cycles SCF iterations in average are needed for the rest relative smaller fragments. It should be noted that the calculation of each individual fragments in Metal-MFCC can be conducted independently, which makes it suitable for massively parallel computations, thus if the calculations were performed on more CPU cores, its advantage in efficiency will be more obvious. In addition, with the increase of the size of the studied system, the computational cost of the full-system QM calculation increases rapidly, while the Metal-MFCC method almost scales linearly with the size of the system.

ACS Paragon Plus Environment

Page 13 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 4. Comparison of the computational cost of Metal-MFCC and full-system QM calculation on a single linux node with two E5-2680v2 CPUs (10 cores, 2.80GHz) for 15 metalloprotein systems. All calculations were performed with Gaussian 09 at the M06-2X/6-31G(d) level and the 2-body cutoff is 4.5Å.

System

Metal

Computational cost (hour)

Atom No.

Metal-MFCC

Full-system QM

Time Ratio

1ZE9

Zn2+

266

0.40

0.30

0.75

2ZNF

Zn2+

284

0.43

0.43

1.0

2L44

Zn2+

307

0.35

0.400

1.14

1AT1

Zn2+

376

0.40

0.63

1.58

1B8T

Zn2+

427

0.50

0.72

1.43

1ZE9

Cu2+

266

0.62

1.22

1.97

1A4A

Cu2+

357

0.67

3.07

4.60

1AAC

Cu2+

398

0.73

2.45

3.34

1J5D

Cu2+

405

0.57

2.43

4.29

1AZB

Cu2+

410

0.60

2.52

4.19

4DP1

Cu+

349

0.47

1.08

2.32

1A8Z

Cu+

424

0.53

1.52

2.84

1B3I

Cu+

436

0.52

1.38

2.68

2LEL

Cu+

518

0.88

3.83

4.34

2K70

Cu+

541

0.65

1.27

1.95

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 21

In the above discussions, we have shown that the Metal-MFCC method has good agreement with fullsystem QM results for both potential energy and atomic forces. However, in MD simulation or structural optimization, accurate calculation of the relative potential energy between different protein conformations is more relevant than absolute energies. Thus, we applied the Metal-MFCC method to calculate the relative potential energies of different conformers of three metalloproteins (1ZE9, 1A4A, 4DP1, with Zn2+, Cu2+ and Cu+ ions, respectively). For each protein, 20 different conformations were selected. Fig. 7 shows the calculated results at the M06-2X/6-31G(d) level. As can be seen, the energy profile based on Metal-MFCC is consistent with the one calculated from full-system QM method.

Figure 7. Comparison of the relative potential energies of 20 conformers of three metalloproteins (1ZE9, 1A4A, 4DP1, with Zn2+, Cu2+, Cu+, respectively) between the full-system QM calculations and Metal-MFCC at M06-2X/6-31G(d) level. The potential energy of the first conformer calculated using each method was taken as zero. In the QM calculations, pseudopotentials are widely used for molecular systems containing transition metal ions to improve the computational efficiency. In this work, we also checked the performance of MetalMFCC with the LANL2DZ pseudopotential. In the calculation, the M06-2X functional was chosen and the LANL2DZ pseudopotential was used for metal ions, while the other atoms were treated with 6-31G(d) basis set. As shown in Table S1 and Figure S1, the potential energies of 9 metalloproteins evaluated with MetalMFCC show great agreement with the corresponding full system results at M06-2X/LANL2DZ(6-31G(d)) level. And the atomic forces also agree well with the full QM calculations, the max RMSE of atomic forces for the studied 6 systems is 1.11 kcal/(mol* Å). 3.3 Many body expansion of the metal binding group. As discussed above, a main advantage of the fragment methods over full-system QM calculation is its high efficiency in parallelization. The computational efficiency is thus determined by the number of atoms in each fragment. In order to further improve the efficiency of the Metal-MFCC method, we tested whether it ACS Paragon Plus Environment

Page 15 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

is possible to divide the MBG to smaller fragments. Two fragment schemes were tested. In scheme 1, all Mconcaps in the MBG including the metal itself are defined as individual fragments and the energy of capped MBG is expressed as two-body expansion of these fragments:

E(Mcaps* -MBG)=

N Mconcap +1 ~

 

E +



~

~

~

(E  E  E  )

(10)

 ,  

Where α and β represent the index of the Mconcaps and metal ion in the MBG. The calculated results of this scheme are shown in Table S2. As can be seen, the deviation of calculated potential energies from that calculated by full-system QM calculation under this scheme is much larger than that of standard MetalMFCC. It seems that this scheme cannot describe the many body effect of metal binding site well. Including three-body interaction may be a better choice, but it will obviously increase the computational cost. Inspired by the EE-MB method developed by Krubanov et al.56, 57, we designed a new fragment scheme for MBG (scheme 2). In this scheme, the metal cation and its closest ligand with the negative charge are defined as one fragment, and then represent the MBG with two-body expansion. As can be seen from Table S2, the results of this scheme show significant improvement from that of scheme 1, but still not as accurate as that of standard Metal-MFCC. The MUE of this scheme is 9.16 kcal/mol, which is slightly larger than that of MetalMFCC (3.59 kcal/mol) but much smaller than that of scheme 1 (49.96 kcal/mol). This fragment scheme should be useful for big MBG or metalloproteins that has many MBGs. 3.4 Application of Metal-MFCC on structural optimization of a small metalloprotein. In the past decades, the inaccuracy of the traditional force fields has been a practical obstacle for molecular modeling study of metalloproteins. It is well known that the dynamic result of a MD simulation is critically determined by the potential energy surface. Since the Metal-MFCC method can give more accurate potential energy and atomic forces for metalloproteins at QM level with high efficiency, it could be employed to perform ab initio molecular dynamic simulations (AIMD) or structural optimization for metalloproteins. In this kind of calculation, the energy and atomic forces are calculated by Metal-MFCC, whereas the nuclear dynamics of the system is described by classical mechanics. Due to the limitation of the computational cost, it is difficult to perform long time MD simulations for metalloproteins using the current approach at present. Thus we performed a 100 step structural optimization for a small zinc-finger protein (32 residues and one zinc ion, as shown in Fig. 8) with the steepest decent gradient method. For comparison, another minimization was also performed in which the Metal-MFCC was substituted by full-system QM calculation. The relative energy changes as a function of iterations are shown in Fig. 8. One can see that the potential energy with Metal-MFCC converged almost at the same time as full-system QM calculation. Two energy curves are almost the same with each other. The relative energy with Metal-MFCC is converged at 164 kcal/mol, while the relative energy with full-system QM converged at -169 kcal/mol. The RMSD between the final converged structures from these two methods is only 0.016Å. For each step, the relative energy deviation between Metal-MFCC and full-system QM calculation is within 8 kcal/mol. The above results show that Metal-MFCC is potentially powerful and attractive for studying dynamic properties of ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 21

metalloprotein. As compared with classical force fields, important QM effects such as electrostatic polarization and charge transfer are included intrinsically in the Metal-MFCC method, which is significantly important for metalloproteins. In addition, Metal-MFCC method is more balanced without any priori structural biases that are inherent in some predefined force fields.

Figure 8. Structural minimization with both full-system QM and Metal-MFCC methods for a small zincfinger protein. The red line represents the relative energy with full-system QM calculation and the blue line represents the result with Metal-MFCC. The potential energy of the initio structure was taken as zero.

4

Conclusion: In this paper, the Metal-MFCC approach was presented based on our previous works. This method

focuses on efficient linear-scaling quantum calculation of potential energy and atomic forces for metalloproteins. In this approach, the total energy of metalloprotein is calculated by taking a linear combination of the QM energy of the neighboring residues, the two-body interaction energy between nonneighboring residues that are spatially in close contact and the energy of the metal binding group. The QM calculations of all fragments are embedded in a field of point charges representing the remaining protein environment. Numerical studies were carried out to calculate the potential energies and atomic forces of a series of metalloproteins with different metal ions at the M06-2X/6-31G(d) level. For potential energy, the overall mean unsigned error for 15 systems is 3.23 kcal/mol with reference to the conventional full-system QM calculation. In addition, the Metal-MFCC method adopt a new Gconcap scheme to describe interactions between non-neighboring residues that are spatially in close contact, and thus has better performance for the calculation of atomic forces than the EE-GMFCC method developed in our previous study. The mean unsigned error of calculated atomic forces for the studied 6 systems is only 0.90 kcal/(mol*Å) with reference to the full-system QM calculation. Based on this method, the AIMD simulation was performed to minimize a

ACS Paragon Plus Environment

Page 17 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

small zinc-finger protein and the results also shows excellent agreement with the full-system QM calculation. The Metal-MFCC method is linear-scaling and trivially parallelizable. Its computational cost is independent of the size of the studied system if the computational resources increases linearly. In addition, as each fragment is relatively small, it is possible to carry out QM calculation at higher QM level such as MP2. Further development of this approach will focus on two aspects. The first is to further extend the metalloproteins to which the Metal-MFCC method is applicable. In the current version of Metal-MFCC, we mainly focus on the most common metalloproteins that contain mono-nuclear metal centers and ions like Zn2+, Cu2+ and Cu+. We will continue to develop this method so that it can be used in more complex metalloproteins which contain multi-nuclear metal centers and/or heavier metals such as Fe, Ni and Cd. The second is to further improve the efficiency of the Metal-MFCC method. For example, the precompiled Gaussian 09 software was used for all QM calculations in this work for convenience, which can be changed to other faster software packages or more efficient algorithms such as TeraChem58 which can run QM calculations on graphics processing unit (GPU). With further development and continued advance in computer technology, the Metal-MFCC method will become more and more practical, and it would be attractive to apply this method to perform structure optimization, AIMD simulation, and to calculate other important QM properties for metalloproteins. Research along these directions and applications is underway in our laboratory.

Acknowledgment: This work was supported by the Ministry of Science and Technology of China (Grant No. 2016YFA0501700), the National Natural Science Foundation of China (Grants No. 91641116, 91753103 and 21433004), Innovation Program of Shanghai Municipal Education Commission (201701070005E00020), and NYU Global Seed Grant. We thank the Supercomputer Center of East China Normal University for providing us computer time.

Supporting information: Potential energies and atomic forces calculated with pseudo potential and many-body expansion of metal binding group. This information is available free of charge via the Internet at http://pubs.acs.org

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 21

References 1. Waldron, K. J.; Robinson, N. J., How do bacterial cells ensure that metalloproteins get the correct metal?. Nat. Rev. Microbiol 2009, 7, 25-35. 2. Lovell, T.; Himo, F.; Han, W. G.; Noodleman, L., Density functional methods applied to metalloenzymes. Coord. Chem. Rev. 2003, 238, 211-232. 3. Stevens, F. C., Calmodulin: an introduction. Can. J. Biochem. Cell. Biol. 1983, 61, 906-910. 4. Hoovers, J. M.; Mannens, M.; John, R.; Bliek, J.; van Heyningen, V.; Porteous, D. J.; Leschot, N. J.; Westerveld, A.; Little, P. F., High-resolution localization of 69 potential human zinc finger protein genes: a number are clustered. Genomics 1992, 12, 254-263. 5. Marshall, N. M.; Garner, D. K.; Wilson, T. D.; Gao, Y. G.; Robinson, H.; Nilges, M. J.; Lu, Y., Rationally tuning the reduction potential of a single cupredoxin beyond the natural range. Nature 2009, 462, 113-116. 6. Ahlstrand, E.; Hermansson, K.; Friedman, R., Interaction Energies in Complexes of Zn and Amino Acids: A Comparison of Ab Initio and Force Field Based Calculations. J. Phys. Chem. A 2017, 121, 2643-2654. 7. Gresh, N., Energetics of Zn2+ Binding to a Series of Biologically Relevant Ligands - a Molecular Mechanics Investigation Grounded on Ab-Initio Self-Consistent-Field Supermolecular Computations. J. Comput. Chem. 1995, 16, 856-882. 8. Gresh, N.; de Courcy, B.; Piquemal, J. P.; Foret, J.; Courtiol-Legourd, S.; Salmon, L., Polarizable Water Networks in Ligand-Metalloprotein Recognition. Impact on the Relative Complexation Energies of Zn-Dependent Phosphomannose Isomerase with D-Mannose 6Phosphate Surrogates. J. Phys. Chem. B 2011, 115, 8304-8316. 9. Sakharov, D. V.; Lim, C., Force Fields Including Charge Transfer and Local Polarization Effects: Application to Proteins Containing Multi/Heavy Metal Ions. J. Comput. Chem. 2009, 30, 191-202. 10. Sakharov, D. V.; Lim, C., Zn protein simulations including charge transfer and local polarization effects. J. Am. Chem. Soc. 2005, 127, 4921-4929. 11. Wu, R.; Lu, Z.; Cao, Z.; Zhang, Y., A Transferable Non-bonded Pairwise Force Field to Model Zinc Interactions in Metalloproteins. J. Chem. Theory Comput. 2011, 7, 433-443. 12. Wu, J. C.; Piquemal, J. P.; Chaudret, R.; Reinhardt, P.; Ren, P. Y., Polarizable Molecular Dynamics Simulation of Zn(II) in Water Using the AMOEBA Force Field. J. Chem. Theory Comput. 2010, 6, 2059-2070. 13. Zhang, J. J.; Yang, W.; Piquemal, J. P.; Ren, P. Y., Modeling Structural Coordination and Ligand Binding in Zinc Proteins with a Polarizable Potential. J. Chem. Theory Comput. 2012, 8, 1314-1324. 14. Li, P.; Merz, K. M., Jr., Taking into Account the Ion-induced Dipole Interaction in the Nonbonded Model of Ions. J. Chem. Theory Comput. 2014, 10, 289-297. 15. Yang, Z. Z.; Cui, B. Q., Atomic charge calculation of metallobiomolecules in terms of the ABEEM method. J. Chem. Theory Comput. 2007, 3, 1561-1568. 16. Lemkul, J. A.; Huang, J.; Roux, B.; MacKerell, A. D., An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications. Chem. Rev. 2016, 116, 4983-5013. 17. Soniat, M.; Hartman, L.; Rick, S. W., Charge Transfer Models of Zinc and Magnesium in Water. J. Chem. Theory Comput. 2015, 11, 1658-1667. 18. Zhu, T.; Xiao, X.; Ji, C.; Zhang, J. Z., A New Quantum Calibrated Force Field for ZincProtein Complex. J. Chem. Theory Comput. 2013, 9, 1788-1798. 19. Li, P. F.; Merz, K. M., Metal Ion Modeling Using Classical Mechanics. Chem. Rev. 2017, 117, 1564-1686. 20. van den Bosch, M.; Swart, M.; Snijders, J. G.; Berendsen, H. J.; Mark, A. E.; Oostenbrink, C.; van Gunsteren, W. F.; Canters, G. W., Calculation of the redox potential of the protein azurin and some mutants. ChemBioChem 2005, 6, 738-746. 21. Cascella, M.; Magistrato, A.; Tavernelli, I.; Carloni, P.; Rothlisberger, U., Role of protein ACS Paragon Plus Environment

Page 19 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

frame and solvent for the redox properties of azurin from Pseudomonas aeruginosa. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 19641-19646. 22. Paraskevopoulos, K.; Sundararajan, M.; Surendran, R.; Hough, M. A.; Eady, R. R.; Hillier, I. H.; Hasnain, S. S., Active site structures and the redox properties of blue copper proteins: atomic resolution structure of azurin II and electronic structure calculations of azurin, plastocyanin and stellacyanin. Dalton Trans. 2006, 3067-3076. 23. Manesis, A. C.; Shafaat, H. S., Electrochemical, Spectroscopic, and Density Functional Theory Characterization of Redox Activity in Nickel-Substituted Azurin: A Model for Acetyl-CoA Synthase. Inorg. Chem. 2015, 54, 7959-7967. 24. Zanetti-Polzi, L.; Bortolotti, C. A.; Daidone, I.; Aschi, M.; Amadei, A.; Corni, S., A few key residues determine the high redox potential shift in azurin mutants. Org. Biomol. Chem 2015, 13, 11003-11013. 25. Ali-Torres, J.; Marechal, J. D.; Rodriguez-Santiago, L.; Sodupe, M., Three dimensional models of Cu2+-Aβ(1-16) complexes from computational approaches. J. Am. Chem. Soc. 2011, 133, 15008-15014. 26. Mujika, J. I.; Pedregal, J. R. G.; Lopez, X.; Ugalde, J. M.; Rodriguez-Santiago, L.; Sodupe, M.; Marechal, J. D., Elucidating the 3D structures of Al(III)-Aβ complexes: a template free strategy based on the pre-organization hypothesis. Chem. Sci 2017, 8, 5041-5049. 27. Mirats, A.; Ali-Torres, J.; Rodriguez-Santiago, L.; Sodupe, M., Stability of transient Cu+Aβ (1-16) species and influence of coordination and peptide configuration on superoxide formation. Theor. Chem. Acc. 2016, 135, 1-9. 28. Li, S.; Li, W.; Fang, T., An efficient fragment-based approach for predicting the ground-state energies and structures of large molecules. J. Am. Chem. Soc. 2005, 127, 7215-7226. 29. Li, S.; Li, W.; Ma, J., Generalized energy-based fragmentation approach and its applications to macromolecules and molecular aggregates. Acc. Chem. Res. 2014, 47, 2712-2720. 30. He, X.; Zhang, J. Z., The generalized molecular fractionation with conjugate caps/molecular mechanics method for direct calculation of protein energy. J. Chem. Phys. 2006, 124, 184703. 31. Liu, J.; Zhu, T.; Wang, X.; He, X.; Zhang, J. Z., Quantum Fragment Based ab Initio Molecular Dynamics for Proteins. J. Chem. Theory Comput. 2015, 11, 5897-5905. 32. Wang, X.; Liu, J.; Zhang, J. Z.; He, X., Electrostatically embedded generalized molecular fractionation with conjugate caps method for full quantum mechanical calculation of protein energy. J. Phys. Chem. A 2013, 117, 7149-7161. 33. Zhu, T.; He, X.; Zhang, J. Z., Fragment density functional theory calculation of NMR chemical shifts for proteins with implicit solvation. Phys. Chem. Chem. Phys. 2012, 14, 7837-7845. 34. Zhu, T.; Zhang, J. Z.; He, X., Automated Fragmentation QM/MM Calculation of Amide Proton Chemical Shifts in Proteins with Explicit Solvent Model. J. Chem. Theory Comput. 2013, 9, 2104-2114. 35. Zhang, D. W.; Zhang, J. Z. H., Molecular fractionation with conjugate caps for full quantum mechanical calculation of protein-molecule interaction energy. J. Chem. Phys. 2003, 119, 35993605. 36. Fedorov, D. G.; Ishimura, K.; Ishida, T.; Kitaura, K.; Pulay, P.; Nagase, S., Accuracy of the three-body fragment molecular orbital method applied to Moller-Plesset perturbation theory. J. Comput. Chem. 2007, 28, 1476-1484. 37. Fedorov, D. G.; Kitaura, K., Second order Moller-Plesset perturbation theory based upon the fragment molecular orbital method. J. Chem. Phys. 2004, 121, 2483-2490. 38. Fedorov, D. G.; Kitaura, K., Extending the power of quantum chemistry to large systems with the fragment molecular orbital method. J. Phys. Chem. A 2007, 111, 6904-6914. 39. Babu, K.; Gadre, S. R., Ab initio quality one-electron properties of large molecules: development and testing of molecular tailoring approach. J. Comput. Chem. 2003, 24, 484-495. 40. Ganesh, V.; Dongare, R. K.; Balanarayan, P.; Gadre, S. R., Molecular tailoring approach for geometry optimization of large molecules: energy evaluation and parallelization strategies. J. Chem. Phys. 2006, 125, 104109. ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 21

41. Collins, M. A.; Deev, V. A., Accuracy and efficiency of electronic energies from systematic molecular fragmentation. J. Chem. Phys. 2006, 125, 104104. 42. Mullin, J. M.; Roskop, L. B.; Pruitt, S. R.; Collins, M. A.; Gordon, M. S., Systematic fragmentation method and the effective fragment potential: an efficient method for capturing molecular energies. J. Phys. Chem. A 2009, 113, 10040-10049. 43. Dahlke, E. E.; Truhlar, D. G., Electrostatically embedded many-body correlation energy, with applications to the calculation of accurate second-order Moller-Plesset perturbation theory energies for large water clusters. J. Chem. Theory Comput. 2007, 3, 1342-1348. 44. Dahlke, E. E.; Truhlar, D. G., Electrostatically embedded many-body expansion for large systems, with applications to water clusters. J. Chem. Theory Comput. 2007, 3, 46-53. 45. Dahlke, E. E.; Truhlar, D. G., Electrostatically embedded many-body expansion for simulations. J. Chem. Theory Comput. 2008, 4, 1-6. 46. Exner, T. E.; Mezey, P. G., The field-adapted ADMA approach: Introducing point charges. J. Phys. Chem. A 2004, 108, 4301-4309. 47. Exner, T. E.; Mezey, P. G., Evaluation of the field-adapted ADMA approach: absolute and relative energies of crambin and derivatives. Phys. Chem. Chem. Phys. 2005, 7, 4061-4069. 48. Liu, J.; Herbert, J. M., Pair-Pair Approximation to the Generalized Many-Body Expansion: An Alternative to the Four-Body Expansion for ab Initio Prediction of Protein Energetics via Molecular Fragmentation. J. Chem. Theory Comput. 2016, 12, 572-584. 49. He, X.; Zhu, T.; Wang, X. W.; Liu, J. F.; Zhang, J. Z. H., Fragment Quantum Mechanical Calculation of Proteins and Its Applications. Acc. Chem. Res. 2014, 47, 2748-2757. 50. Gordon, M. S.; Fedorov, D. G.; Pruitt, S. R.; Slipchenko, L. V., Fragmentation Methods: A Route to Accurate Calculations on Large Systems. Chem. Rev. 2012, 112, 632-672. 51. Collins, M. A.; Bettens, R. P. A., Energy-Based Molecular Fragmentation Methods. Chem. Rev. 2015, 115, 5607-5642. 52. Rulisek, L.; Vondrasek, J., Coordination geometries of selected transition metal ions (Co2+, Ni2+, Cu2+, Zn2+, Cd2+, and Hg2+) in metalloproteins. J. Inorg. Biochem. 1998, 71, 115-127. 53. Kuppuraj, G.; Dudev, M.; Lim, C., Factors governing metal-ligand distances and coordination geometries of metal complexes. J. Phys. Chem. B 2009, 113, 2952-2960. 54. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Petersson, G. A.; Nakatsuji, H.; Li, X.; Caricato, M.; Marenich, A. V.; Bloino, J.; Janesko, B. G.; Gomperts, R.; Mennucci, B.; Hratchian, H. P.; Ortiz, J. V.; Izmaylov, A. F.; Sonnenberg, J. L.; Williams; Ding, F.; Lipparini, F.; Egidi, F.; Goings, J.; Peng, B.; Petrone, A.; Henderson, T.; Ranasinghe, D.; Zakrzewski, V. G.; Gao, J.; Rega, N.; Zheng, G.; Liang, W.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Throssell, K.; Montgomery Jr., J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M. J.; Heyd, J. J.; Brothers, E. N.; Kudin, K. N.; Staroverov, V. N.; Keith, T. A.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A. P.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Millam, J. M.; Klene, M.; Adamo, C.; Cammi, R.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Farkas, O.; Foresman, J. B.; Fox, D. J. Gaussian 09 Rev. D.01; Wallingford, CT, 2016. 55. Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C., ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J Chem. Theory Comput. 2015, 11, 3696-3713. 56. Kurbanov, E. K.; Leverentz, H. R.; Truhlar, D. G.; Amin, E. A., Electrostatically Embedded Many-Body Expansion for Neutral and Charged Metalloenzyme Model Systems. J. Chem. Theory Comput. 2012, 8, 1-5. 57. Kurbanov, E. K.; Leverentz, H. R.; Truhlar, D. G.; Amin, E. A., Analysis of the Errors in the Electrostatically Embedded Many-Body Expansion of the Energy and the Correlation Energy for Zn and Cd Coordination Complexes with Five and Six Ligands and Use of the Analysis to Develop a Generally Successful Fragmentation Strategy. J. Chem. Theory Comput. 2013, 9, 2617-2628. 58. Ufimtsev, I. S.; Martinez, T. J., Quantum Chemistry on Graphical Processing Units. 3. ACS Paragon Plus Environment

Page 21 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics. J Chem. Theory Comput. 2009, 5, 2619-2628.

TOC graphic:

ACS Paragon Plus Environment