Electrostatically Embedded Generalized Molecular Fractionation with

Mar 1, 2013 - East China Normal University, Shanghai 200062, China. ‡. Department of Chemistry, New York University, New York, New York 10003, Unite...
1 downloads 0 Views 3MB Size
Article pubs.acs.org/JPCA

Electrostatically Embedded Generalized Molecular Fractionation with Conjugate Caps Method for Full Quantum Mechanical Calculation of Protein Energy Xianwei Wang,† Jinfeng Liu,† John Z. H. Zhang,†,‡ and Xiao He*,† †

State Key Laboratory of Precision Spectroscopy and Department of Physics, Institute of Theoretical and Computational Science, East China Normal University, Shanghai 200062, China ‡ Department of Chemistry, New York University, New York, New York 10003, United States S Supporting Information *

ABSTRACT: An electrostatically embedded generalized molecular fractionation with conjugate caps (EE-GMFCC) method is developed for efficient linear-scaling quantum mechanical (QM) calculation of protein energy. This approach is based on our previously proposed GMFCC/MM method (He; et al. J. Chem. Phys. 2006, 124, 184703), In this EE-GMFCC scheme, the total energy of protein is calculated by taking a linear combination of the QM energy of the neighboring residues and the two-body QM interaction energy between non-neighboring residues that are spatially in close contact. All the fragment calculations are embedded in a field of point charges representing the remaining protein environment, which is the major improvement over our previous GMFCC/MM approach. Numerical studies are carried out to calculate the total energies of 18 real three-dimensional proteins of up to 1142 atoms using the EE-GMFCC approach at the HF/6-31G* level. The overall mean unsigned error of EE-GMFCC for the 18 proteins is 2.39 kcal/mol with reference to the full system HF/6-31G* energies. The EE-GMFCC approach is also applied for proteins at the levels of the density functional theory (DFT) and secondorder many-body perturbation theory (MP2), also showing only a few kcal/mol deviation from the corresponding full system result. The EE-GMFCC method is linear-scaling with a low prefactor, trivially parallel, and can be readily applied to routinely perform structural optimization of proteins and molecular dynamics simulation with high level ab initio electronic structure theories.

1. INTRODUCTION Because it is still not practical to apply standard full system quantum mechanical (QM) methods for total energy calculation of proteins due to the large size of the systems, accurate and efficient quantum calculation of protein energy presents a grand challenge in computational chemistry and biology. To extend the applicability of rigorous quantum mechanical method to large systems, considerable effort has been made to the development of various linear-scaling and/or fragmentation methods over the past decade for energy calculation of macromolecules.1−16 Among the existing approaches, the fragmentation approach have become one of the central focuses in developing linear-scaling QM method for large systems (see a recent review by Gordon and co-workers8). In the fragmentation approach, the system is divided into subsystems (fragments) and subsequently the property of the whole system can be obtained by taking a linear combination of the properties of individual fragments. The fragmentation approach takes advantage of the chemical locality of most molecular systems, which assumes that local regions of a system are only weakly influenced by the atoms that are far away from © 2013 American Chemical Society

the region of interest. The fragmentation approach is attractive in several aspects including easy implementation of parallelization without extensively modifying the existing QM programs and straightforward application at all levels of ab initio electronic structure theories. A range of fragment-based methods for quantum calculation of proteins, molecular clusters, or other macromolecular systems have been proposed in recent years, including the fragment molecular orbital (FMO) method of Kitaura and co-workers,4,17−19 the molecular tailoring approach (MTA) of Gadre et al.,20−22 the systematic fragmentation method (SFM) of Collins and coworkers,23−25 the generalized energy-based fragmentation (GEBF) of Li and co-workers,26−28 the electrostatically embedded many-body (EE-MB) expansion approach by Dahlke and Truhlar,29−31 the adjustable density matrix assembler (ADMA) method of Exner and Mezey,5,32,33 and Special Issue: Joel M. Bowman Festschrift Received: January 23, 2013 Revised: February 28, 2013 Published: March 1, 2013 7149

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

to the corresponding full system QM energies. Finally, a brief summary is given in section IV.

the molecular fractionation with conjugate caps (MFCC) method.34−40 In our previous studies, we employed the MFCC approach to calculate the full electron density of the system, which is then used to compute the total energy of protein at the DFT level.34 To make the direct MFCC energy calculation more practical for applications such as structural optimization of protein systems and molecular dynamics simulation, we also proposed an efficient generalized molecular fractionation with conjugate caps/molecular mechanics (GMFCC/MM) scheme.36 In the GMFCC/MM method, molecular force field interactions are introduced to represent the long-range interaction between distant non-neighboring fragments whereas the short-range non-neighboring fragment interactions are calculated by quantum chemistry. The GMFCC/MM method has been shown to give good agreement in torsional energies of several polypeptides with the full system quantum calculation. In the GMFCC/MM approach, the environmental effect was not included for each fragment calculation. Previous studies5,27,29,31,41,42 have shown that the electronic polarization arising from the environment is important for including the many-body effect in fragmentation methods. In this work, we present an electrostatically embedded generalized molecular fractionation with the conjugate caps (EE-GMFCC) method for accurate calculation of the protein energy. This approach calculates the total energy of the protein system by taking a linear combination of fragment-based energies of neighboring residues and interaction energies between non-neighboring residues (the two residues are not sequentially connected to each other) that are spatially in close contact, with a key element being that all of the calculated fragments are embedded in a field of point charges to mimic the electrostatic environment of the remaining system. Dahlke and Truhlar29−31 and Hirata and co-workers43,44 have successfully applied this electrostatically embedded scheme for molecular clusters by many-body expansion method. In this study, we extend the embedding scheme to study large protein systems by combining with the MFCC approach. The current EE-GMFCC approach is also different from the GEBF26−28 method of Li and co-workers, which is another derivative of the MFCC method. In the GEBF approach, all the nearby residues within a certain distance from the central residue are included in each fragment calculation; therefore, each fragment may have a large subsystem. In contrast, in the EE-GMFCC calculation, only sequentially connected tripeptide, and the two-body interaction between two non-neighboring residues within short distance are explicitly computed by quantum mechanics, whereas the three- and all higher-order Coulomb effects are approximated through the embedding field. As a result, the size of each fragment in EE-GMFCC is much smaller than that in the GEBF method, resulting in a smaller prefactor for the linear-scaling fragmentation method. In this study, we demonstrate that the EE-GMFCC approach is capable of describing the electronic structure of large protein systems at the HF, DFT, and MP2 levels with accuracy comparable to that of the GEBF method. In addition, the present EE-GMFCC method works well in predicting the relative energies of proteins. This paper is organized as follows. Section II provides the detailed description of the EE-GMFCC method for total energy calculations of protein systems. In section III, we performed test calculations for a wide range of protein systems using HF, DFT, and MP2 with the 6-31G* basis set and compared them

2. THEORY AND METHODOLOGY A. EE-GMFCC Method. In the framework of MFCC approach, a given protein with N amino acids (defined as A1A2A3···AN) is decomposed into N individual amino acid units by cutting through the peptide bond as illustrated in Figure 1.

Figure 1. MFCC scheme in which the peptide bond is cut in (a) and the fragments are capped with Capi+1 and its conjugate Cap*i in (b), where subscript i denotes the ith amino acid in the given protein. The atomic structure of concap in (c). The concap is defined as the fused molecular species of Capi* − Capi+1.

To preserve the local chemical environment of the remaining fragment for each separated amino acid, a pair of conjugate caps is designed to saturate each fragment. Hydrogen atoms are added to terminate the molecular caps to avoid dangling bonds (Figure 1). The MFCC energy of the protein can be approximately expressed as the sum of the capped fragments minus that of the conjugate caps, N−1

EMFCC =

N−2

∑ E(Cap*i− 1AiCapi+ 1) − ∑ E(Cap*i Capi+ 1) i=2

i=2

(1)

where i denotes the index for ith residue, E(Capi−1 * AiCapi+1) is the self-energy of the ith residue Ai capped with a left cap Cap*i−1 and a right cap Capi+1, and E(Cap*i Capi+1) is the energy of the ith concap, respectively. The energy of concap is deducted to avoid double counting of the self-energy of the amino acids. However, the EMFCC term only includes the self-energy of individual residue and interaction energy between sequentially connected tripeptides. Therefore, in the previous study,36 we have proposed a mixed quantum-classical GMFCC/MM approach, in which we added the interaction energy between two non-neighboring residues to obtain the total energy expression for proteins. The energy expression of the GMFCC/ MM method is given as follows, 7150

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A EGMFCC/MM = EMFCC +



Article

energy of protein using EE-GMFCC approach is expressed by the following relation (the derivation is provided in the Appendix),

(Eij − Ei − Ej)QM

i ,j>i+2 |R i − R j|≤ λ

+ {∑ (Ekl − Ek − El)MM − k ,l



(Eij − Ei − Ej)MM }

N−1

i ,j>i+2 |R i − R j|≤ λ

E EE − GMFCC =

* Ai Cap ) ̃ ∑ E(Cap i−1 i+1 i=2

(2)

N−2

where Ei, Ej denotes the energy of the residue i and j capped with hydrogen atoms on both sides, respectively. Eij is the energy of dimer consisting of residue i and j. λ is the distance threshold. If the distance of any two atoms between fragment i and j is less than or equal to λ (|Ri − Rj| ≤ λ), the interaction energy between these two fragments are treated by quantum mechanics, otherwise, the interaction energy is calculated by molecular mechanics. As shown in Figure 2, we introduce a



*Cap ) + ∑ ̃ ∑ E(Cap i i+1

(Eij̃ − Eĩ − Ej̃ )QM

i ,j>i+2 |R i − R j|≤ λ

i=2

⎧ ⎪ qm(k)qn(l) ⎪ − ⎨∑ ∑ − R ⎪ ⎪ k , l m , n m(k)n(l) ⎩

∑ ∑ i ,j>i+2 m′,n′ |R i − R j|≤ λ

⎫ qm ′ (i)qn ′ (j) ⎪ ⎪ ⎬ R m ′ (i)n ′ (j) ⎪ ⎪ ⎭

(3)

where Ẽ represents the summation of the self-energy of the fragment and the interaction energy between the fragment and background charges. i and j denote the residue number of the protein. qm(k) represents the point charge of the mth atom in the fragment k. The definition of concap and Gconcap is same as GMFCC/MM (Figures 1 and 2). It is proved in the Appendix that k = (H)Capi−1 * −NH and l = CO − Ai+2Ai+3···AN, (i = 2, 3, ..., N − 2), where k and l represent all the pairs whose interaction energies have been doubly counted in the MFCC ̃ ̃ calculation of ∑N−1 * AiCapi+1) − ∑N−2 i=2 E(Capi−1 i=2 E(Capi*Capi+1) with the embedding scheme. Furthermore, analogously to the GMFCC/MM method, if the distance of any two atoms between residue i and j is less than or equal to the distance threshold λ, the interaction energy between these two residues are calculated by quantum mechanics; otherwise, the double counting of interaction energy between distant non-neighboring residues is deducted by charge−charge interactions approximately, which is similar to the treatment in GEBF method.27 Fragmentation methods have shown that the two-body interactions make the most important contribution to the total energy of protein systems besides the self-energy of each fragment.26,29,45,46 The n-body (n > 2) interaction energies usually have small contributions to the total energy and the computational cost can be significantly high.47 The crucial aspect of this EE-GMFCC method is the electrostatic embedding of the fragment calculation in the rest of the protein environment, which, in this work, consists of atomic partial charges. The embedding scheme ensures the electronic polarization effect to be taken into account. Through the monomer and dimer calculations, one- and two-body kinetic, Coulomb, exchange, and correlation interactions are included nearly exactly when we perform MP2 calculations. Through the embedding scheme, three- and all higher-order Coulomb effects are also included at the HF or DFT level. It is this embedding approach that makes the Bethe−Goldstone expansion rapidly convergent and the truncation after the dimer terms sufficiently accurate for HF and MP2 calculations.44 Previous studies on water clusters by Truhlar and co-workers29,30 have shown that the two-body electron correlation energies recover most of the electron correlation interaction energy of the whole system, because the electron correlation energies are intrinsically shortrange. In our current treatment, each MFCC fragment typically has three residues, and each neighboring concap shown in Figure 1 and non-neighboring Gconcap in Figure 2 contains two residues. Although the total number of fragment pairs is proportional to the square of the number of residues, but the

Figure 2. Generalized concap scheme in (a). The atomic structure of the generalized concap in (b).

generalized concap (Gconcap) to account for the two-body QM interaction energies between sequentially non-neighboring fragments that are spatially in close contact. k and l in eq 2 represent all the pairs whose interaction energies have not been included in the QM calculation of EMFCC. As the conjugate caps are defined in Figure 1, it can be proved that k=(H)Cap*i−1 −NH (where (H) denotes that the added hydrogen atom on the left is excluded) and l = CO − Ai+2Ai+3···AN, (i = 2, 3, ..., N− 2). The major drawback of the GMFCC/MM approach is that the many-body contribution to the total energy from multifragment interactions was neglected. In this study, we utilize the electrostatic embedding scheme on top of the GMFCC/MM method. We name the new approach EEGMFCC, in which all the fragment calculations in GMFCC are embedded in the electrostatic field of the point charges representing the remaining fragments in the protein. The total 7151

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

Table 1. Total Energies of 18 Three-Dimensional Globular Proteins Obtained from Standard Full System HF/6-31G* Calculations and EE-GMFCC HF/6-31G* Calculations with Different Distance Threshold λa energy deviation from full system calculation (kcal/mol) system 1AML 2XL1 1BZG 1BBA 2JPK 2PPZ 1VTP 2YSC 2RLK 1WN8 2I9M 1LE1 2L8E 3V1E 3T97 (chain 2LH0 (chain 3L32 (chain 3NTW (chain MUE RMSE

no. of atoms

no. of basis functions

full system (au)

EE-GMFCC (λ = 3.0 Å)

EE-GMFCC (λ = 3.5 Å)

598 243 573 582 589 608 396 578 588 354 246 218 760 706 999

5178 2137 4851 5033 5000 5111 3418 5108 5089 2913 2069 1944 6493 6061 8302

−15141.907936 −6183.350149 −13680.776389 −15103.386877 −13855.225441 −14958.009045 −10015.004475 −14634.634993 −14589.934893 −8878.730735 −6267.195519 −5470.031387 −20270.549968 −18015.494102 −24598.988684

4.28 −0.94 6.77 1.48 5.52 3.15 0.97 5.56 9.60 5.02 4.11 1.97 2.36 −0.31 8.80

0.85 −1.29 3.35 0.63 4.27 2.14 0.94 4.09 7.12 3.88 3.41 1.16 1.63 −1.11 6.58

0.14 −1.34 2.76 0.25 3.34 1.71 0.47 3.66 5.66 3.70 3.47 0.49 0.45 −1.66 5.62

1142

9702

−28500.533406

−0.01

−4.91

738

6264

−18054.990862

2.57

960

8016

−23683.450620

B) A) A) A)

EE-GMFCC (λ = 4.0 Å)

EE-GMFCC (λ = 4.5 Å)

max. no. of atoms in fragments

0.57 −1.08 2.78 0.09 2.99 1.59 0.81 3.02 5.68 3.46 3.34 0.36 −0.35 −2.21 6.10

59 65 70 67 69 63 56 68 64 59 63 60 65 56 66

−5.92

−5.45

63

0.57

−0.45

−0.77

63

4.81

3.45

1.94

1.61

62

3.79 4.66

2.85 3.47

2.39 3.06

2.35 2.99

[33.03] [−18.44] [71.27] [44.91] [−9.64] [54.94] [54.42]

The point changes are taken from the Amber94 force field. The numbers in square brackets are based on energies calculated using the GMFCC method without the embedding field (eq 2). a

was then used to calculate PPC charges by the MFCC computational protocol. The protein was divided into capped amino-acid fragments whereas the rest of the protein fragments were employed as an external point charge for quantumchemical calculations at the QM level. The electrostatic potentials were saved and a standard two-stage RESP fitting procedure was used to fit effective point charges of protein.50,51 The newly fitted atomic charges were passed to the next calculation of charge fitting. The calculated corrected reaction field energy from Delphi54 using the fitted charges is taken as the convergence criterion, and this process is iterated until its variation is smaller than a certain threshold. Usually the criterion will be reached within five iterations. C. Molecular Dynamics (MD) Simulation. We also compared the EE-GMFCC calculated relative energy of protein conformations with the full system calculations. The conformations for three proteins (PDB id: 1VTP, 2KCF, and 1BHI) were selected from MD simulations using the Amber ff99SB force field. During the MD simulation, the protein is placed in a periodic rectangular box of TIP3P water molecules. The distance from the surface of the box to the closest atom of the solute is set to 12 Å. Counterions are added to neutralize the system. Two steps of minimizations were employed to optimize the initial structure. In the first step, only the solvent molecules were relaxed, whereas protein atoms were constrained to their initial structure. In the second step, the entire system was energy-minimized until convergence was reached, and then the system was brought to room temperature (300 K) in 100 ps, followed by another 2 ns simulation with periodic boundary condition at 300 K and 1 atm (NPT ensemble). The time integration step is 2.0 fs and the SHAKE algorithm55 was

number of non-neighboring residues in close contact increases linearly because each residue can only have limited number of residues in its vicinity. Because the largest fragment has just three residues and contains less than 70 atoms consisting of C, H, O, N, and S, this method can be applied to high level ab initio calculation such as MP2. We tested the performance of the EE-GMFCC approach on a wide range of protein systems using HF, DFT, and MP2 methods with the 6-31G* basis set. The protein structures are downloaded from the Protein Data Bank (PDB). The protein structures were all optimized using the Sander module from the AMBER program48 to remove bad contacts prior to subsequent ab initio calculations. We calculated the total energy of each protein using the EE-GMFCC approach and the result is compared to the full system (nonfragmented) energy. All ab initio calculations were carried out using the Gaussian09 program.49 We also test the convergence of the EE-GMFCC energy with respect to the distance threshold λ, which determines whether the two-body interaction energy of the non-neighboring residues is calculated by QM or charge− charge interactions. B. Point Charge Models. Both the fixed charge model of Amber9450,51 and the polarized protein-specific charges (PPCs)52 have been utilized to describe the embedding field in this study. PPCs were fitted to electrostatic potentials by fragment quantum-mechanical calculations using an iterative approach as described in ref 52. Specifically, for the protein structures used in this work, the missing hydrogen atoms were added using the LEaP module of AMBER program.48 A series of minimizations using the Amber ff99SB force field53 was carried out to remove close contacts. The minimized structure 7152

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

deviations are all within 0.01 hartree when the distance threshold λ is equal to or great than 4.0 Å. Furthermore, we varied the distance threshold to test the convergence of the EEGMFCC method. The mean unsigned error (MUE) dropped from 3.79 to 2.39 kcal/mol when λ is increased from 3.0 to 4.0 Å; however, the MUE is almost flat when λ is increased from 4.0 to 4.5 Å (2.39 versus 2.35 kcal/mol). Figure 3 shows the EE-GMFCC calculated total energies of four proteins as a function of the distance threshold as reference to the full system results. The total energies are all close to convergence at λ = 4 Å. The closest non-neighboring fragment appears when λ is about 1.7−1.9 Å for these four proteins. Furthermore, one can see from Figure 3 that, the QM calculation for the twobody interactions of non-neighboring residues in close contact are indispensible, which account for 28.17, 28.15, 58.87, and 49.15 kcal/mol for proteins of 1AML, 1BBA, 2PPZ, and 1BZG, respectively. Hence, the two-body QM correction of vicinal non-neighboring residues is crucial to reproducing the full system energy in fragmentation calculations. We also calculated the EE-GMFCC energies using selfconsistently determined PPC point changes as shown in Table S1 of the Supporting Information. The overall error is slightly larger than that using fixed Amber94 charge models. The reason is not clear yet. The use of Amber94 charges can avoid self-consistently determining the point charges at the QM level, resulting in substantial computational saving for large protein calculation. Moreover, analytical atomic gradients can be readily obtained using a fixed point charge model. Therefore, we conclude that using the Amber94 charge model to approximate the electrostatic potential of the protein is a simpler way and without sacrificing much accuracy. For comparison, the total energies calculated using the GMFCC method (without the embedding field) for seven proteins are also shown in Table 1. It clearly shows that the errors from GMFCC calculations are significantly larger than those calculated by EE-GMFCC. The MUE of GMFCC is 40.95 kcal/mol for these seven proteins, which has 1 order of magnitude larger than that of EE-GMFCC. The comparison underscores that the environmental polarization has a substantial effect on the self-energy calculation for each fragment and thus should be properly taken into account for accurately reproducing the full system energy in fragment-based QM methods. Our study demonstrates that the embedding field using point charges is an efficient way to treat the manybody polarization effect in fragment calculations. B. DFT and MP2 Calculations Using the EE-GMFCC Approach. We also extend the EE-GMFCC approach to DFT (B3LYP) and MP2 levels of theory with the 6-31G* basis set. As shown in Table 2, the energies of six proteins evaluated using EE-GMFCC with λ = 4.0 Å agree well with the corresponding full system energies at the B3LYP/6-31G* level. However, the average error from B3LYP/6-31G* calculations is slightly larger than that from HF/6-31G*. In addition, the energies calculated by EE-GMFCC are all lower than the full system results. Because in the EE-GMFCC scheme, energy calculations are performed on small individual fragments whose sizes are all within 70 atoms (Tables 1 and 2), we can employ higher level ab initio methods such as MP2. This is an important advantage because higher level correlation methods are more accurate and capable of capturing dispersion energies that are not possible with the HF and DFT methods.58,59 For comparison, Table 3 shows the results of nine polypeptides calculated at the MP2/6-

applied to maintain all hydrogen atoms in reasonable positions. The particle-mesh Ewald56 method was used to treat longrange electrostatic interactions and a 10 Å cutoff for the van der Waals interactions was implemented. Langevin dynamics57 was applied to regulate the temperature with a collision frequency of 1.0 ps−1. Nineteen snapshots from each trajectory were selected every 100 ps for EE-GMFCC calculations. All the MD simulations were performed with the AMBER10 program.48

3. RESULTS AND DISCUSSION A. Accuracy of the EE-GMFCC Method at the HF/631G* Level. We first applied the EE-GMFCC approach to

Figure 3. Relative energies (in kcal/mol, red line) as a function of the distance threshold λ for four proteins. The total energy from the full system calculation (black line) is regarded as the reference.

Table 2. Total Energies of Six Three-Dimensional Globular Proteins Obtained from Standard Full System B3LYP/631G* Calculations and EE-GMFCC B3LYP/6-31G* Calculations with the Distance Threshold λ of 4.0 Åa

system 1LE1 2I9M 2LBG 2PPZ 2XL1 1WN8

no. of no. of basis atoms functions 218 246 349 608 243 354

1944 2069 2946 5111 2137 2913

full system (au)

deviation of EE-GMFCC (kcal/mol)

max. no. of atoms in fragments

−5503.239121 −6303.290348 −9441.002447 −15045.757643 −6220.495977 −8929.300270

−3.74 −1.53 −5.17 −7.65 −7.30 −6.92

60 63 53 63 65 59

a The point changes are taken from the Amber94 force field. Deviation denotes the energy difference between EE-GMFCC and full system calculations.

calculate the total energy of 18 proteins at the HF/6-31G* level. We chose a variety of proteins with mixed secondary structures of α-helix and β-sheet as listed in Table 1. The number of residues in those proteins ranges from 12 to 70, and the number of atoms ranges from 218 to 1142. The fixed charge model of Amber94 was used for background charges. Table 1 shows the deviation of total energies calculated using the EE-GMFCC method compared to standard full system calculations for those 18 globular proteins. It can be seen that the EE-GMFCC calculated energies have excellent agreement with the corresponding full system HF calculations. The 7153

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

Table 3. Total Energies of Nine Small Proteins Obtained from Standard Full System MP2/6-31G* Calculations and EEGMFCC MP2/6-31G* Calculations with the Distance Threshold λ of 4.0 Åa system

no. of atoms

2XL1 (10−15)b 3RWJ (chain C) 2XL1 (10−19)b 3P4M (chain C) 3RZ2 (chain C) 3THK (chain C) 2FVJ (chain B) 4DS1 (chain B) 3RWF (chain C)

94 139 166 144 156 89 174 148 148

no. of basis functions full system MP2 energy (au) 838 1240 1502 1211 1261 737 1401 1297 1375

−2523.521626 −3530.769069 −4400.646301 −3387.940562 −3469.717360 −2019.637413 −3920.317527 −3853.963487 −4074.538654

deviation of EE-GMFCC (kcal/mol)

max. no. of atoms in fragments

3.19 5.80 4.23 4.33 5.52 2.95 2.83 5.29 1.11

51 52 60 57 60 44 59 48 51

a The point changes are taken from the Amber94 force field. Deviation denotes the energy difference between EE-GMFCC and full system calculations. bResidue number.

Figure 4. CPU time for the conventional and EE-GMFCC calculations as a function of the number of basis functions at the HF/6-31G* (top) and MP2/6-31G* (bottom) levels.

and full system is very good at the MP2/6-31G* level. All the errors are below 0.01 hartree. In contrast to what we observed in B3LYP/6-31G* calculations, the MP2/6-31G* energies calculated using EE-MFCC/MM are all slightly higher than the corresponding full system results.

31G* level. Owing to the limited computational resources, we extracted two polypeptide chains with different length from 2XL1 to calculate their full system energies and made comparison with the energies obtained from the EE-GMFCC scheme. The distance threshold λ for two-body interaction was also set to 4.0 Å. The overall agreement between EE-GMFCC 7154

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

fragment is within three amino acids, and the number of fragments is N − 2 (where N is the number of the amino acids in a given protein). The Gconcap introduced in EE-GMFCC has two amimo acids in each fragment, and the number of Gconcaps increases linearly with the size of the protein as each residue can only have a limited number of residues within the distance threshold λ. As a result, the computational scale of EEGMFCC method is exactly linear with a low prefactor. The total number (M) of fragments is given by P̅ N+N (4) 2 where P̅ is the average number of amino acids within the distance threshold λ from each residue. (N − 2) and (N − 3) represent the number of fragments Cap*i AiCapi+1, (i = 2, 3, ..., N −1) and Cap*i Capi+1, (i = 2, 3, ..., N −2) (see eq 1), respectively. The M fragment calculations can be performed on M computer nodes in parallel. The total computational cost (T) is given by M = (N − 2) + (N − 3) +

⎛ P̅ ⎞ T = α3̅ (N − 2) + α̅2⎜N − 3 + N ⎟ + α1̅ N ⎝ 2 ⎠

(5)

where α̅ 3, α̅2, and α̅1 are the average computational time for the capped fragment Cap*i−1AiCapi+1, the concap Cap*i Capi+1 or Gconcap, and the single residue, respectively, at the specified QM level. When N is a large number, eq 5 can be approximately obtained as ⎡ ⎤ ⎛ P̅ ⎞ T ≈ ⎢α3̅ + α̅2⎜1 + ⎟ + α1̅ ⎥N ⎝ ⎠ ⎦ 2 ⎣

(6)

Consequently, the computational CPU time scales linearly with a prefactor of α̅3 + α̅ 2[1 + (P̅/2)] + α̅ 1. Moreover, if the calculation is carried out on M computer nodes in parallel, the computational wall time can be approximately reduced to the maximum calculation time of one capped fragment Cap*i−1AiCapi+1. A comparison of the CPU time on the 8-core Intel Xeon E5620 2.4 GHz processor using the standard HF and EEGMFCC approaches is shown in Figure 4. As expected, the computational scale for the EE-GMFCC calculation is O(N), in contrast to O(N3) for the traditional HF calculation on the entire system. Figure 4 also shows the comparison of CPU time for the conventional MP2 and EE-GMFCC method on the 12core Intel Xeon E5645 2.4 GHz processor. As can be seen from the figure, the computational scale of the EE-GMFCC MP2 calculation is O(N) with a low prefactor as well, whereas the computational cost for traditional MP2 calculation on the entire system shows O(N4∼N5) as a function of the number of basis functions. For the smallest system with 89 atoms (3THK, Table 3), the EE-GMFCC MP2 calculation already shows 9 times speedup over the conventional MP2. Therefore, the basis set crossover point between conventional MP2 and EEGMFCC calculation falls below 737. Furthermore, the EEGMFCC method for MP2 calculations has more computational advantages over conventional ones in terms of disk space requirement and memory usage. D. Conformational Energies of Proteins. In previous sections, we have shown that the EE-GMFCC scheme has good agreement with full system results for total energy calculation of proteins. In this section, the EE-GMFCC scheme is applied to calculate the relative energies of different conformers of three globular proteins (PDB id: 1VTP (396 atoms, α-helix), 2KCF

Figure 5. Three representative three-dimensional protein structures studied in this work.

The EE-GMFCC calculated B3LYP/6-31G* and MP2/631G* energies using PPC charges are shown in Tables S2 and S3 of the Supporting Information, respectively. The overall performance of these two charge models (Amber94 and PPC) are close to each other; however, similar to the conclusion for HF calculations, the errors of EE-GMFCC using PPC are slightly larger than those using Amber94. Hence, we prefer to employ the mean-filed-like Amber94 charges in EE-GMFCC calculations. The performance of other charge models will be further investigated in future studies. C. Computational Efficiency of the EE-GMFCC Method. In the MFCC scheme, the size of each capped 7155

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

Figure 6. Comparison of the relative energies of 19 conformers selected from 2 ns MD simulation between the standard full system calculations and different MFCC schemes at the HF/6-31g* level . The distance threshold λ was set to 4.0 Å. The total energy of the first conformer calculated using each method was taken as zero. FS denotes the full system result. EE-GMFCC_Amber94, EE-GMFCC_PPC, and GMFCC represent the fragment calculations performed with the electrostatically embedded Amber94 charges, with the PPC charges of the native structure and without the embedding field, respectively. 7156

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

Table 4. Statistical Deviations of the Relative Energy (kcal/mol) Calculated by the EE-GMFCC and GMFCC Methods with Respect to the Full System Calculations on 18 Selected MD Snapshots for Each of Three Proteins at the HF/6-31G* Levela 1VTP

2KCF

system

EE-GMFCC _Amber94

EE-GMFCC _PPC

MIND MAXD MSE MUE RMSE

0.04 5.54 2.37 2.37 2.68

−1.10 3.61 1.15 1.46 1.74

GMFCC

EE-GMFCC _Amber94

EE-GMFCC _PPC

−0.99 28.59 15.21 15.32 17.02

−1.53 4.99 1.82 2.07 2.60

−1.96 7.34 1.65 2.36 2.83

1BHI GMFCC

EE-GMFCC _Amber94

EE-GMFCC _PPC

GMFCC

−35.02 −8.23 −22.27 22.27 23.41

−1.61 6.11 2.62 2.95 3.32

−0.75 7.62 3.43 3.56 4.12

−5.74 25.07 12.32 12.95 14.43

The total energy of the first conformer calculated using each method was taken as zero. The distance threshold λ was set to 4.0 Å. MIND, MAXD, MSE, MUE, and RSME denote the minimum deviation, the maximum deviation, mean signed error, mean unsigned error, and root-mean-squared error, respectively. EE-GMFCC_Amber94, EE-GMFCC_PPC, and GMFCC represent the fragment calculations are performed with the electrostatically embedded Amber94 charges, with PPC charges of the native structure, and without the embedding field, respectively. a

Figure 7. Similar to Figure 6c, but for M06/6-31G* calculations on 16 conformers of 1BHI.

for 18 conformers of 1VTP (with respect to the energy of the first conformer using the corresponding method, and the relative energies from full system calculations were taken as reference) based on EE-GMFCC_PPC and EE-GMFCC_Amber94 are only 1.46 and 2.37 kcal/mol, respectively, which indicates that the use of self-consistently determined PPC charges and fixed Amber94 charges to mimic the electrostatic potential are both sufficient to describe the relative energy profile for proteins. The maximum deviation for 1VTP is 3.61 and 5.54 kcal/mol for EE-GMFCC_Amber94 and EEGMFCC_PPC, respectively. Thus, although the EE-GMFCC method is capable of quantitatively describing the relative energy profile of proteins, it may not be able to correctly give the rank of the relative energies of two conformers whose energy difference is small than a certain threshold. For 1VTP, the threshold is about 5.4 kcal/mol using Amber94 charges (two times of the RMSE). As for the systems of 2KCF and 1BHI, the MUE and RMSE given by EE-GMFCC_Amber94 are both lower than EE-GMFCC_PPC as shown in Table 4, and again the GMFCC has large deviations from the full system results. Therefore, we conclude that the mean-field-like Amber94 charge model can give comparable accuracy with the self-consistently determined PPCs for EE-GMFCC calculations on conformational energies of proteins.

Table 5. Similar to Table 4, but for M06/6-31G* Calculations on 15 Conformers of 1BHI 1BHI system

EE-GMFCC_Amber94

EE-GMFCC_PPC

GMFCC

MIND MAXD MSE MUE RMSE

−0.83 6.07 2.60 2.82 3.28

−3.20 8.15 3.65 4.13 4.58

−8.85 23.20 11.00 12.19 14.21

(576 atoms, β-sheet), and 1BHI (591 atoms, mixture of α-helix and β-sheet), Figure 5). Figure 6 shows that relative energies of 19 conformers for each protein at the HF/6-31G* level. The energy profile based on EE-GMFCC with the embedding field of Amber94 charge model (EE-GMFCC_Amber94) is consistent with the one calculated from full system. On the other hand, the EE-GMFCC scheme using the embedding field of PPC charges from the native structure (EE-GMFCC_PPC) also has good agreement with the full system results for these systems. On the contrary, when we remove the embedding field for the fragment calculation, the deviations of relative energies are significant as compared to the full system calculation. Table 4 shows the statistical errors for EE-GMFCC_Amber94, EE-GMFCC_PPC, and GMFCC (without embedding field). The mean unsigned errors (MUE) of the relative energy 7157

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

intrafragment interaction, and interactions between fragment and background charges),

We also performed EE-GMFCC DFT calculations on the conformational energies of 1BHI at the M06/6-31G* level.60 Because the full system calculations for three conformers could not reach convergence, 16 conformers of 1BHI were selected for M06/6-31G* calculations. As shown in Figure 7, the overall trend is similar to what we have observed in HF/6-31G* calculations. Both EE-GMFCC_Amber94 and EEGMFCC_PPC give good agreement with the full system results. In addition, the errors become noticeable for GMFCC calculations without the embedding field. The MUE of the relative energy for 15 conformers based on EE-GMFCC_Amber94 and EE-GMFCC_PPC are only 2.82 and 4.13 kcal/mol, respectively (Table 5). The RMSE for EE-GMFCC_Amber94, EE-GMFCC_PPC, and GMFCC are 3.28, 4.58, and 14.21 kcal/mol, respectively. The distance threshold for M06 calculations was also set to 4.0 Å, which seems sufficient to obtain accurate total energies of proteins for M06 calculations as well.

N−2

EMFCC_int =

i=2

− E int(Cap*i Capi + 1:Rest)] + E int(Cap*N − 2 AN − 1CapN :Rest)

N−3

EMFCC_int =

∑ [Eint(A̅i Bi̅ :Rest(A̅i Bi̅ )) i=1

− E int(Bi̅ :Rest(Bi̅ ))] + E int(AN̅ − 2 BN̅ − 2 :Rest(AN̅ − 2 BN̅ − 2 ))

(A3)

where the interaction energy caused by the added hydrogen atoms is ignored, because the point charge of added nonpolar hydrogen is usually small and the interaction energies associated with the added hydrogen atoms are approximately canceled out in the MFCC scheme. The interaction energy between fragment A̅ iB̅ i and rest of the system can be decomposed into E int(A̅ i Bi̅ :Rest(A̅ i Bi̅ )) = E int(Bi̅ :Rest(A̅ i Bi̅ )) + E int(A̅ i :A̅ i (L)) + E int(A̅ i :Bi̅ (R )) (A4)

where A̅ i(L) and B̅ i(R) denote the remaining systems on the sequentially left-hand side of the fragment A̅ i and on the sequentially right-hand side of the fragment B̅ i, respectively. Hence, eq A3 can be approximately obtained as N−3

EMFCC_int = { ∑ [−E int(Bi̅ :A̅ i ) + E int(A̅ i :A̅ i (L))] i=1

+ E int(AN̅ − 2 BN̅ − 2 :AN̅ − 2 BN̅ − 2 (L))} N−3

+



∑ Eint(A̅i :Bi̅ (R)) i=1

(A5)

∑ i =N1− 3 [−E i n t (B̅ i :A̅ i )

The term + E i n t (A̅ i :A̅ i (L))] + Eint(A̅ N−2B̅ N−2:A̅ N−2B̅ N−2(L)) in eq A5 goes through all the pairwise interaction energy between fragments, which are calculated via interactions between the electron cloud of the fragment and the point-charges representing the remaining system, except the A̅ i:B̅ i pair (the interaction energy between A̅ i and B̅ i is computed within each QM calculation of the fragment A̅ iB̅ i). This combined QM/MM treatment to calculate the interaction energy is a reasonable approximation only when the background charges are sufficiently far away from the fragment. The last term ∑N−3 i=1 Eint(A̅ i:B̅ i(R)) in eq A5 is the pairwise interactions doubly counted, which needs to be deducted. For non-neighboring residues i and j that are in close contact (|Ri −

APPENDIX: DERIVATION OF THE EE-GMFCC EQUATION The first two terms in eq 3 can be rewritten as N−2

* Ai Cap ) − E(Cap *Cap )] ̃ ̃ ∑ [E(Cap i−1 i+1 i i+1 i=2

* AN − 1Cap ) ̃ + E(Cap N−2 N

(A2)

where Eint(Capi−1 * AiCapi+1:Rest) represents the interaction energy between the fragment Capi−1 * AiCapi+1 and the rest of the system (represented by point charges). For simplicity, we use A̅ i and B̅ i to represent the fragment Capi−1 * −NH− and the overlapped conjugate caps (H)Capi*Capi+1 (where (H) denotes that the added hydrogen atom on the left is excluded), respectively. Thus the fragment Cap*i−1AiCapi+1 is shortened to A̅ iB̅ i. Because the self-energies of added hydrogen atoms are completely canceled out in MFCC calculation, the interaction energy for eq A2 can be written as

4. CONCLUSIONS In this paper, we presented the electrostatically embedded generalized molecular fractionation with conjugate caps (EEGMFCC) method. In this scheme, fragment-based energies of neighboring residues and interaction energies of nonneighboring residues that are spatially in close contact are computed by quantum mechanics, with a key element being that all of the calculated fragments are embedded in a field of point charges, whereas the interaction energies between distant non-neighboring residues are treated by charge−charge interactions. Numerical studies are carried out to calculate the total energies of 18 real three-dimensional proteins of up to 1142 atoms using the EE-GMFCC approach at the HF/6-31G* level. The overall mean unsigned error of EE-GMFCC for the 18 proteins is 2.39 kcal/mol with reference to the full system HF/ 6-31G* energies when the distance threshold is set to 4.0 Å. The EE-GMFCC approach is further applied for proteins at the DFT and MP2 levels, also showing only a few kcal/mol deviation from the corresponding full system result. The EE-GMFCC method is computationally efficient and linear-scaling with a low prefactor. Every fragment normally contains fewer than 70 atoms consisting of C, H, O, N, and S. All of the individual QM calculations can be carried out at the MP2 level in parallel. This method can also be extended to coupled-cluster theory. Furthermore, it is possible to apply the EE-GMFCC method to perform structure optimization or molecular dynamics simulation of real protein systems by high level quantum mechanical methods. Research along these directions is underway in our laboratory.

̃ EMFCC =

∑ [Eint(Cap*i− 1AiCapi+ 1:Rest)

(A1)

Because the MFCC energy expression in eq A1 ensures that the self-energy of each fragment is counted once, here we only consider the interaction energy between residues (including the 7158

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A



Rj| ≤ λ, the distance of any two atoms between residues i and j is less than or equal to the distance threshold λ), we treat the two-body interaction energy quantum mechanically as follows, ΔE(2) =



*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



(A6)

ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (Grants No. 10974054 and 20933002) and Shanghai PuJiang program (09PJ1404000). We also thank the Computational Center of ECNU for providing us computational time. X.H. greatly acknowledges Prof. So Hirata for many helpful discussions on fragmentation methods.

because the self-energy of fragments i and j are canceled, we only consider the interaction energy between fragments, which can be approximately obtained as



[E int(i:j)QM + E int(ij:Rest(ij))QM:MM

i ,j>i+2 |R i − R j|≤ λ



− E int(i:j ∪ Rest(ij))QM:MM



[E int(i:j)QM − E int(i:j)QM:MM

i ,j>i+2 |R i − R j|≤ λ

− E int(j:i)QM:MM ]

(A7)

where Rest(ij) represents the rest of the system of fragments i and j, Eint(ij:Rest(ij))QM:MM denotes the QM/MM interaction energy between the QM fragment ij and background charges of Rest(ij). The QM/MM interaction energy between fragments i and j (−Eint(i:j)QM:MM − Eint(j:i)QM:MM, the last two terms in eq A7) approximately cancels the double counting in eq A1, resulting that the interaction energy of the dimer ij is calculated using quantum mechanics (Eint(i:j)QM). This cancellation treatment in the electrostatic embedding scheme is essentially the same as what has been described in the FMO method proposed by Kitaura and co-workers,17,18 and the EE-MB method developed by Dahlke and Truhlar.29−31 Here, we derive the equation in the framework of the EE-GMFCC approach. For the distant non-neighboring residues (|Ri − Rj| > λ), we use a simple charge−charge interaction approximation to deduct the double counting of pairwise interactions (∑N−3 i=1 Eint(A̅ i:B̅ i(R)) in eq A5) as follows, ⎧ ⎪ qm(k)qn(l) ⎪ ΔE = −⎨∑ ∑ − R ⎪ ⎪ k , l m , n m(k)n(l) ⎩

∑ ∑ i ,j>i+2 m′,n′ |R i − R j|≤ λ

⎫ qm ′ (i)qn ′ (j) ⎪ ⎪ ⎬ R m ′ (i)n ′ (j) ⎪ ⎪ ⎭ (A8)

where k = (H)Cap*i−1−NH, l = CO−Ai+2Ai+3···AN (i = 2, 3, ..., N − 2), and qm(k) represents the point charge of the mth atom in the fragment k. In summary, the final form of EE-GMFCC equation is given as eq 3. The choice of distance threshold λ is tested in the numerical calculations.



REFERENCES

(1) Daniels, A. D.; Scuseria, G. E. What Is the Best Alternative to Diagonalization of the Hamiltonian in Large Scale Semiempirical Calculations? J. Chem. Phys. 1999, 110, 1321−1328. (2) White, C. A.; Johnson, B. G.; Gill, P. M. W.; Head-Gordon, M. The Continuous Fast Multipole Method. Chem. Phys. Lett. 1994, 230, 8−16. (3) Korchowiec, J.; Lewandowski, J.; Makowski, M.; Gu, F. L.; Aoki, Y. Elongation Cutoff Technique Armed with Quantum Fast Multipole Method for Linear Scaling. J. Comput. Chem. 2009, 30, 2515−2525. (4) Fedorov, D. G.; Kitaura, K. Extending the Power of Quantum Chemistry to Large Systems with the Fragment Molecular Orbital Method. J. Phys. Chem. A 2007, 111, 6904−6914. (5) Exner, T. E.; Mezey, P. G. Ab Initio-Quality Electrostatic Potentials for Proteins: An Application of the ADMA Approach. J. Phys. Chem. A 2002, 106, 11791−11800. (6) Friesner, R. A.; Murphy, R. B.; Beachy, M. D.; Ringnalda, M. N.; Pollard, W. T.; Dunietz, B. D.; Cao, Y. X. Correlated Ab Initio Electronic Structure Calculations for Large Molecules. J. Phys. Chem. A 1999, 103, 1913−1928. (7) Goedecker, S. Linear Scaling Electronic Structure Methods. Rev. Mod. Phys. 1999, 71, 1085−1123. (8) Gordon, M. S.; Fedorov, D. G.; Pruitt, S. R.; Slipchenko, L. V. Fragmentation Methods: A Route to Accurate Calculations on Large Systems. Chem. Rev. 2012, 112, 632−672. (9) Kohn, W. Density Functional and Density Matrix Method Scaling Linearly with the Number of Atoms. Phys. Rev. Lett. 1996, 76, 3168− 3171. (10) Scuseria, G. E. Linear Scaling Density Functional Calculations with Gaussian Orbitals. J. Phys. Chem. A 1999, 103, 4782−4790. (11) Strain, M. C.; Scuseria, G. E.; Frisch, M. J. Achieving Linear Scaling for the Electronic Quantum Coulomb Problem. Science 1996, 271, 51−53. (12) Yang, W. T. Direct Calculation of Electron-Density in DensityFunctional Theory. Phys. Rev. Lett. 1991, 66, 1438−1441. (13) Yang, W. T.; Lee, T. S. A Density-Matrix Divide-and-Conquer Approach for Electronic-Structure Calculations of Large Molecules. J. Chem. Phys. 1995, 103, 5674−5678. (14) Schwegler, E.; Challacombe, M. Linear Scaling Computation of the Hartree-Fock Exchange Matrix. J. Chem. Phys. 1996, 105, 2726− 2734. (15) White, C. A.; Johnson, B. G.; Gill, P. M. W.; Head-Gordon, M. Linear Scaling Density Functional Calculations via the Continuous Fast Multipole Method. Chem. Phys. Lett. 1996, 253, 268−278. (16) He, X.; Merz, K. M. Divide and Conquer Hartree-Fock Calculations on Proteins. J. Chem. Theor. Comput. 2010, 6, 405−411. (17) Nakano, T.; Kaminuma, T.; Sato, T.; Fukuzawa, K.; Akiyama, Y.; Uebayasi, M.; Kitaura, K. Fragment Molecular Orbital Method: Use of Approximate Electrostatic Potential. Chem. Phys. Lett. 2002, 351, 475− 480. (18) Fedorov, D. G.; Ishimura, K.; Ishida, T.; Kitaura, K.; Pulay, P.; Nagase, S. Accuracy of the Three-Body Fragment Molecular Orbital

− E int(j:i ∪ Rest(ij))QM:MM ] =

AUTHOR INFORMATION

Corresponding Author

(Eij̃ − Eĩ − Ej̃ )QM

i ,j>i+2 |R i − R j|≤ λ

(2) ΔE int =

Article

ASSOCIATED CONTENT

S Supporting Information *

Tables S1, S2, and S3 show the deviation of the calculated total energy between EE-GMFCC using PPC charges and standard full system calculations at the HF, DFT, and MP2 levels, respectively. Complete ref 49. This material is available free of charge via the Internet at http://pubs.acs.org. 7159

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

Method Applied to Moller-Plesset Perturbation Theory. J. Comput. Chem. 2007, 28, 1476−1484. (19) Fedorov, D. G.; Kitaura, K. Second Order Moller-Plesset Perturbation Theory Based upon the Fragment Molecular Orbital Method. J. Chem. Phys. 2004, 121, 2483−2490. (20) Gadre, S. R.; Ganesh, V. Molecular Tailoring Approach: Towards PC-Based Ab Initio Treatment of Large Molecules. J. Theor. Comput. Chem. 2006, 5, 835−855. (21) Elango, M.; Subramanian, V.; Rahalkar, A. P.; Gadre, S. R.; Sathyamurthy, N. Structure, Energetics, And Reactivity of Boric Acid Nanotubes: A Molecular Tailoring Approach. J. Phys. Chem. A 2008, 112, 7699−7704. (22) Babu, K.; Gadre, S. R. Ab Initio Quality One-Electron Properties of Large Molecules: Development and Testing of Molecular Tailoring Approach. J. Comput. Chem. 2003, 24, 484−495. (23) Collins, M. A.; Deev, V. A. Accuracy and Efficiency of Electronic Energies from Systematic Molecular Fragmentation. J. Chem. Phys. 2006, 125, 104104. (24) Mullin, J. M.; Roskop, L. B.; Pruitt, S. R.; Collins, M. A.; Gordon, M. S. Systematic Fragmentation Method and the Effective Fragment Potential: An Efficient Method for Capturing Molecular Energies. J. Phys. Chem. A 2009, 113, 10040−10049. (25) Deev, V.; Collins, M. A. Approximate Ab Initio Energies by Systematic Molecular Fragmentation. J. Chem. Phys. 2005, 122, 154102. (26) Li, S.; Li, W.; Fang, T. An Efficient Fragment-Based Approach for Predicting the Ground-State Energies and Structures of Large Molecules. J. Am. Chem. Soc. 2005, 127, 7215−7226. (27) Li, W.; Li, S.; Jiang, Y. Generalized Energy-Based Fragmentation Approach for Computing the Ground-State Energies and Properties of Large Molecules. J. Phys. Chem. A 2007, 111, 2193−2199. (28) Hua, S. G.; Hua, W. J.; Li, S. H. An Efficient Implementation of the Generalized Energy-Based Fragmentation Approach for General Large Molecules. J. Phys. Chem. A 2010, 114, 8126−8134. (29) Dahlke, E. E.; Truhlar, D. G. Electrostatically Embedded ManyBody Expansion for Large Systems, With Applications to Water Clusters. J. Chem. Theor. Comput. 2007, 3, 46−53. (30) Dahlke, E. E.; Truhlar, D. G. Electrostatically Embedded ManyBody Correlation Energy, With Applications to the Calculation of Accurate Second-Order Moller-Plesset Perturbation Theory Energies for Large Water Clusters. J. Chem. Theor. Comput. 2007, 3, 1342− 1348. (31) Dahlke, E. E.; Truhlar, D. G. Electrostatically Embedded ManyBody Expansion for Simulations. J. Chem. Theor. Comput. 2008, 4, 1− 6. (32) Exner, T. E.; Mezey, P. G. The Field-Adapted ADMA Approach: Introducing Point Charges. J. Phys. Chem. A 2004, 108, 4301−4309. (33) Exner, T. E.; Mezey, P. G. Evaluation of the Field-Adapted ADMA Approach: Absolute and Relative Energies of Crambin and Derivatives. Phys. Chem. Chem. Phys. 2005, 7, 4061−4069. (34) Chen, X.; Zhang, Y.; Zhang, J. Z. An Efficient Approach for Ab Initio Energy Calculation of Biopolymers. J. Chem. Phys. 2005, 122, 184105. (35) He, X.; Zhang, J. Z. A New Method for Direct Calculation of Total Energy of Protein. J. Chem. Phys. 2005, 122, 31103. (36) He, X.; Zhang, J. Z. The Generalized Molecular Fractionation with Conjugate Caps/Molecular Mechanics Method for Direct Calculation of Protein Energy. J. Chem. Phys. 2006, 124, 184703. (37) Zhang, D. W.; Zhang, J. Z. H. Molecular Fractionation with Conjugate Caps for Full Quantum Mechanical Calculation of ProteinMolecule Interaction Energy. J. Chem. Phys. 2003, 119, 3599−3605. (38) Gao, A. M.; Zhang, D. W.; Zhang, J. Z. H.; Zhang, Y. K. An Efficient Linear Scaling Method for Ab Initio Calculation of Electron Density of Proteins. Chem. Phys. Lett. 2004, 394, 293−297. (39) Chen, X. H.; Zhang, J. Z. Molecular Fractionation with Conjugated Caps Density Matrix with Pairwise Interaction Correction for Protein Energy Calculation. J. Chem. Phys. 2006, 125, 44903.

(40) Mei, Y.; He, X.; Ji, C. G.; Zhang, D. W.; John, Z. H. Z. A Fragmentation Approach to Quantum Calculation of Large Molecular Systems. Prog. in Chem. 2012, 24, 1058−1064. (41) Inadomi, Y.; Nakano, T.; Kitaura, K.; Nagashima, U. Definition of Molecular Orbitals in Fragment Molecular Orbital Method. Chem. Phys. Lett. 2002, 364, 139−143. (42) Jiang, N.; Ma, J.; Jiang, Y. Electrostatic Field-Adapted Molecular Fractionation with Conjugated Caps for Energy Calculations of Charged Biomolecules. J. Chem. Phys. 2006, 124, 114112. (43) Hirata, S.; Valiev, M.; Dupuis, M.; Xantheas, S. S.; Sugiki, S.; Sekino, H. Fast Electron Correlation Methods for Molecular Clusters in the Ground and Excited States. Mol. Phys. 2005, 103, 2255−2265. (44) He, X.; Sode, O.; Xantheas, S. S.; Hirata, S. Second-Order Many-Body Perturbation Study of Ice Ih. J. Chem. Phys. 2012, 137, 204505. (45) Mayhall, N. J.; Raghavachari, K. Many-Overlapping-Body (MOB) Expansion: A Generalized Many Body Expansion for Nondisjoint Monomers in Molecular Fragmentation Calculations of Covalent Molecules. J. Chem. Theor. Comput. 2012, 8, 2669−2675. (46) Le, H. A.; Tan, H. J.; Ouyang, J. F.; Bettens, R. P. A. Combined Fragmentation Method: A Simple Method for Fragmentation of Large Molecules. J. Chem. Theor. Comput 2012, 8, 469−478. (47) Fedorov, D. G.; Kitaura, K. Coupled-Cluster Theory Based upon the Fragment Molecular-Orbital Method. J. Chem. Phys. 2005, 123, 134103. (48) Case, D. A.; Cheatham, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26, 1668−1688. (49) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, J. A., Jr.; Vreven, T.; Kudin, K. N.; Burant, J. C.; et al. GAUSSIAN 09, Revision B.01; Gaussian, Inc.: Wallingford, CT, 2010. (50) Bayly, C. I.; Cieplak, P.; Cornell, W. D.; Kollman, P. A. A WellBehaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges - the Resp Model. J. Phys. Chem. 1993, 97, 10269−10280. (51) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. A 2nd Generation Force-Field for the Simulation of Proteins, Nucleic-Acids, and Organic-Molecules. J. Am. Chem. Soc. 1995, 117, 5179−5197. (52) Ji, C. G.; Mei, Y.; Zhang, J. Z. H. Developing Polarized ProteinSpecific Charges for Protein Dynamics: MD Free Energy Calculation of pKA Shifts for Asp26/Asp20 in Thioredoxin. Biophys. J. 2008, 95, 1080−1088. (53) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins: Struct. Funct. Bioinform. 2006, 65, 712−725. (54) Rocchia, W.; Sridharan, S.; Nicholls, A.; Alexov, E.; Chiabrera, A.; Honig, B. Rapid Grid-Based Construction of the Molecular Surface and the Use of Induced Surface Charge to Calculate Reaction Field Energies: Applications to the Molecular Systems and Geometric Objects. J. Comput. Chem. 2002, 23, 128−137. (55) Jean-Paul, R.; Giovanni, C.; Herman, J. C. B. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J. Comput. Phys. 1977, 23, 327−341. (56) Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: An Nlog(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98, 10089−10092. (57) Pastor, R. W.; Brooks, B. R.; Szabo, A. An Analysis of the Accuracy of Langevin and Molecular-Dynamics Algorithms. Mol. Phys. 1988, 65, 1409−1419. (58) Vondrasek, J.; Bendova, L.; Klusak, V.; Hobza, P. Unexpectedly Strong Energy Stabilization Inside the Hydrophobic Core of Small Protein Rubredoxin Mediated by Aromatic Residues: Correlated Ab 7160

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161

The Journal of Physical Chemistry A

Article

Initio Quantum Chemical Calculations. J. Am. Chem. Soc. 2005, 127, 2615−2619. (59) Molnar, L. F.; He, X.; Wang, B.; Merz, K. M., Jr. Further Analysis and Comparative Study of Intermolecular Interactions Using Dimers from the S22 Database. J. Chem. Phys. 2009, 131, 065102. (60) Zhao, Y.; Truhlar, D. G. The M06 Suite of Density Functionals for Main Group Thermochemistry, Thermochemical Kinetics, Noncovalent Interactions, Excited States, And Transition Elements: Two New Functionals and Systematic Testing of Four M06-Class Functionals and 12 Other Functionals. Theor. Chem. Acc. 2008, 120, 215−241.

7161

dx.doi.org/10.1021/jp400779t | J. Phys. Chem. A 2013, 117, 7149−7161