TMFF—A Two-Bead Multipole Force Field for Coarse-Grained

Oct 26, 2016 - Coarse-grained (CG) models are desirable for studying large and complex biological systems. In this paper, we propose a new two-bead mu...
2 downloads 5 Views 3MB Size
Article pubs.acs.org/JCTC

TMFFA Two-Bead Multipole Force Field for Coarse-Grained Molecular Dynamics Simulation of Protein Min Li,† Fengjiao Liu,† and John Z. H. Zhang*,†,‡,§ †

School of Chemistry and Molecular Engineering and School of Physics and Materials Science, East China Normal University, Shanghai 200062, China ‡ NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China § Department of Chemistry, New York University, New York, NY 10003, USA S Supporting Information *

ABSTRACT: Coarse-grained (CG) models are desirable for studying large and complex biological systems. In this paper, we propose a new two-bead multipole force field (TMFF) in which electric multipoles up to the quadrupole are included in the CG force field. The inclusion of electric multipoles in the proposed CG force field enables a more realistic description of the anisotropic electrostatic interactions in the protein system and, thus, provides an improvement over the standard isotropic two-bead CG models. In order to test the accuracy of the new CG force field model, extensive molecular dynamics simulations were carried out for a series of benchmark protein systems. These simulation studies showed that the TMFF model can realistically reproduce the structural and dynamical properties of proteins, as demonstrated by the close agreement of the CG results with those from the corresponding all-atom simulations in terms of root-mean-square deviations (RMSDs) and root-mean-square fluctuations (RMSFs) of the protein backbones. The current two-bead model is highly coarse-grained and is 50-fold more efficient than all-atom method in MD simulation of proteins in explicit water.

1. INTRODUCTION All-atom (AA) molecular dynamics (MD) simulation1 has played important roles in our fundamental understanding of many biological processes for small-protein systems at the atomic-level (e.g., protein−ligand interaction, small-protein folding, conformational dynamics, etc.). However, due to the limitation of the present computing power, it is still impractical to apply AA MD simulation to explore biologically meaningful phenomena beyond the microsecond time scale and nanometer length scale, such as HIV-1 viral capsid assembly and cytoskeletal motions. The development of coarse-grained (CG) models2−6 opened doors for possible MD simulation of larger biomolecule systems. By using a single bead (CG site) to represent a group of atoms or residues, one can significantly reduce the computational cost in MD simulation of large biomolecular systems by eliminating high-frequency atomic motions. The use of CG models can typically reduce simulation cost by orders of magnitude and is thus highly attractive in MD simulation of large biomolecular systems. Also, due to the smoother free energy landscape, CG MD makes it easy to sample more conformation space and therefore help accelerate the sampling efficiency of biological processes. The development of CG models has attracted significant interest in computational biology.4,5,7−14 Recently, the PACE force field (FF) developed by Wu’s group15,16 correctly reproduced folding behaviors of peptides by implicitly © 2016 American Chemical Society

incorporating the hydrogen interactions in the heavy-atom interactions, but it is still expensive in computational cost for large biological systems. Izvekov and Voth developed a multiscale coarse-grained approach (MS-CG)17,18 and derived the effective CG potential parameters directly from atomic MD simulation by force-matching. Another self-learning multiscale CG approach, developed by Takada’group,19 allows one to switch between AA simulation and CG simulation for the system of study. In the popular Martini FF,20−24 Marrink and co-workers developed a popular residue-specific CG model for proteins in which a four-to-one map was adopted for both amino acids and water molecules. This CG model is constructed on given secondary structures of proteins. This standard Martini FF treats the charged beads of protein by a point-charge model and adopts a simple single-point water model. Subsequently, polarizable beads were introduced for the charged and polar side chains to improve the description of the side-chain dynamics of proteins in the Martini FF.25 The Martini three-site polarizable water model26,27 was also proposed to reproduce several experimental properties of water. The Martini FF has been extensively applied to studying the structure and dynamics of biomolecular systems, lipid and Received: August 2, 2016 Published: October 26, 2016 6147

DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156

Article

Journal of Chemical Theory and Computation surfactant,21,22 peptide−bilayer,20 crystalline cellulose microfibers,28 viral capsid, and protein−protein complexes.29 Ha-Duong et al.30,31 proposed another residue-specific CG model for protein, which is independent of the secondary structures of the protein’s native configuration and uses about 2−3 CG beads to represent each amino acid. In this CG model,31 extra positively charged sites are linked to backbone CG beads by the springs, similar to the classical Drude-like oscillator32,33 in AA model, to mimic the dipole−dipole interactions of backbone dynamics and stabilize the secondary structures during CG simulations. A simple two-bead model for amino acids, one for the backbone and another for the side chain, was developed and parametrized on the statistical analysis of a set of experimental structures by Bahar and Jernigan.34 This two-bead model was employed to predict peptide binding to HIV-1 protease by Kurt and co-workers.35 Recently, a helix-propensity potential is added into the twobead FF to improve the folding dynamics for protein HP-36 and protein beta-amyloid by Mukherjee and Bagchi.36,37 Another coarse-grained model similar to the two-bead mapping is parametrized and tested by Gu and co-workers,38 who showed that the model is able to maintain the native-like protein structure during microsecond-level simulations. In addition to the above two-bead model,39,40 similar models were also proposed, such as Levitt’s simplified model41 and the united-residue FF by Scheraga and co-workers42 and so on. For most low-resolution CG models of proteins, such as the standard Martini FF21,22 and the two-bead model,34,39,40 electrostatic interactions are either neglected or represented for charged and polar side chains by placing simple pointcharges on the beads without proper consideration for the anisotropic nature of electrostatic interactions. However, introducing the polarizable or charged auxiliary beads into the CG model certainly increases the number of the CG beads. An alternative strategy is to describe the multipole moments of CG beads by a local frame. In order to describe the anisotropic electrostatic interactions between CG beads during protein MD simulation, Ren and co-workers43−45 combined point multipole interactions and Gay-Berne potentials into CG models for rigid-body MD simulations of benzene and methanol and hydrogen-bonding liquids. This multipole-based model was used to parametrize the residue-specific CG model for proteins by Li and co-workers, namely the GBEMP model,45−47 and this model was found to have excellent correlation with the AA AMOEBA FF.48−50 In addition, Peraro and co-workers51−53 constructed a map for 20 amino acids, similar to the Martini CG map, and described dipole properties for the polar side chains and backbone CG beads by building local frames. In their CG model, the dipoles for backbone and side-chain CG beads in multiscale simulations were constructed to correctly reproduce the all-atom electrostatic field. In this work, we develop a two-bead multipole force field (TMFF) CG model by incorporating electric multipoles into the CG sites in order to represent the anisotropic interactions at the CG level. In the present TMFF model, the multipoles of the CG beads are described by a local-frame strategy. We also reparametrized all pseudobond, pseudoangle, and pseudodihedral interaction terms for 20 amino acids between CG sites by a statistic analysis for about 8000 high-resolution crystal structures. As expected, the root-mean-square fluctuation (RMSF) from TMFF agrees well with that from AA FF (Amber ff14SB).54 Just like the Martini FF,20−22 the TMFF is

lower-resolution, transferable, and general, and it can be conveniently applied to large protein systems. This paper is organized as follows. Section 2 describes the two-bead modeling, the TMFF, and the MD simulation sets. Section 3.1 shows parametrization of the TMFF. Sections 3.2 and 3.3 show, respectively, the results of six tested proteins by TMFF and AA FF and application for adenylate kinase. Section 3.4 shows the efficiency of the TMFF. Section 4 draws the conclusion and points out the limits of the current TMFF.

2. THEORY AND METHODS 2.1. Two-Bead Multipole Force Field (TMFF). Our method adopts a two-bead model:34 one represents the backbone at the Cα atom and the other that for the side chain located at the center of mass of the side chain. Two types of CG beads are denoted, Bi for the backbone (i denotes amino acid) and Si for the side chain. Each CG site is treated as a sphere, and its mass is derived from the sum of all atoms included in the CG group. For glycine, only the backbone CG bead is considered. An illustration of this two-bead model is shown in Figure 1.

Figure 1. Ball-and-stick model depicting the two-bead CG model. The pseudo bond, pseudo angle, and pseudo dihedral angle are respectively noted by b, θ, and φ. The nonbonded interactions in the TMFF are also defined as the like atomistic model.

We then construct the physically based potential to describe the interaction energy of the TMFF as given below: U = Ubond + Uangle + Utorsion + Uvdw + Ueleperm

(1)

where the first three terms Ubond, Uangle, and Utorsion describe the valence-related potential energy (pseudobond stretching, pseudoangle bending, pseudotorsion rotating) and the last two terms Uvdw and Uperm ele , respectively, describe the nonbonded van der Waals (vdW) and electrostatic permanent-multipole interactions between CG sites. However, in view of the large computational cost involved in MD simulations with induced multipoles, we did not include induced dipole and induced quadrupole terms in the term of Uperm ele in eq 1 in our present study. The definitions of pseudo bond, pseudo angle, pseudo torsion, and nonbonded interactions are given in Figure 1. Each term in eq 1 will be explained in detail below. 2.2. TMFF Potential. 2.2.1. Bonded Potential. The pseudo bond energy terms between two consecutive CG beads are given by the following expression, Ubond = Kb(b − b0)2 6148

(2) DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156

Article

Journal of Chemical Theory and Computation

Figure 2. Examples of the “z-then-x” local frame definition. (a) The local frames for each side-chain atom of alanine in the AMOEBA FF. MAA‑H and MAA‑C represent the multipole moments in eq 8 for the H atom and C atom in the AMOEBA FF, respectively. (b) The BB-SC-BB local frame employed during the CG MD simulation. MCG‑BB and MCG‑SC represent the multipole moments in eq 8 for backbone beads and side-chain beads in the TMFF, respectively. (c) The CA-SC-COM local frame for alanine in the TMFF parameter file.

εij =

with an harmonic potential force constant Kb and equilibrium length b0, determined by fitting the energy distribution of the pseudo bond. For the pseudo angular energy, we expand the potential energy in a basis set of Gaussian functions as follows,38 N

Uangle / torsion =

2⎤ ⎡ ⎛ x − bm ⎞ ⎥ ⎟ ⎣ ⎝ cm ⎠ ⎥⎦

∑ am exp⎢⎢−⎜

m=1

cij =

(3)

cii + cjj (7)

2

Mi = [qi , uix , uiy , uiz , Q ixx , Q ixy , Q ixz , ..., Q izz]T

(8)

where qi is the point charge, ui is the dipole, and Qi is the quadrupole, all located at CG site i in the Cartesian representation, and where T denotes the matrix transpose. In our approach, all multipoles for CG site i are obtained from the AMOEBA polarizable FF48−50 by combining all atomic multipoles within the same coordinate frame and placing them at the site of the i-th CG bead. For example, all sets of atomic multipoles (MAA‑H and MAA‑C in Figure 2(a)) of the side-chain atoms of alanine in the local frame are first transformed to those in the global frame of the laboratory coordinate and all multipoles are summed to derive the global CG multipoles for the current side-chain CG bead. Then the global CG multipoles are transformed back to those (MCG‑BB and MCG‑SC in Figure 2(b)) in the local frame constructed by the CG system to obtain local permanent multipoles for each CG bead of TMFF. We adopted the similar “z-then-x” definition of refs 48−50 in the local frame as shown in Figures 2(b) and 2(c). Figure 2(b) shows the BB-SC-BB definition (BB represents the backbone bead, the second BB means the next backbone CG bead, SC represents a side-chain bead) which was employed during the TMFF simulation. Since two beads for one amino acid fail to construct the three-dimensional Cartesian coordinate, Figure 2(c) shows another “z-then-x” definition for residue alanine: the CA-SC−COM definition (CA represents a Cα-atom, COM represents the center of mass of the backbone CG group) which was employed in the TMFF parameter file in Table S6 of the Supporting Information. It is noted that the global CG multipoles for each CG bead would

(4)

where KB is the Boltzmann constant, T is the temperature, and P is the probability. It should be mentioned here that in the above procedure to determine pseudobonded (pseudobond, pseudoangle, pseudotorsion) potential parameters, the contributions from the intramolecular vdW and electrostatic interactions are not excluded. To mitigate this effect, the distributions of bonded terms are weighted by a factor.58 In this work, the torsional PMF was weighted by a factor of 3.0. 2.2.2. van der Waals Potential. The vdW interaction Uvdw between CG sites i and j adopts a standard 6−12 LennardJones potential, expressed as ⎡⎛ ⎞12 ⎛ ⎞6 ⎤ cij cij Uvdw(ij) = 4εij⎢⎢⎜⎜ ⎟⎟ − ⎜⎜ ⎟⎟ ⎥⎥ r ⎝ rij ⎠ ⎦ ⎣⎝ ij ⎠

(6)

The parameters εii and cii for CG particles were obtained by fitting the average interaction potential of the atomic group when the homodimer was steered along typical directions using the Amber ff14SB atomistic FF.54 2.2.3. Electric Multipole Potential. Considering the importance of anisotropic electrostatic interactions, we introduced dipoles and quadrupoles into multipole interactions beyond the charge interaction term for every CG site. For CG site i, the permanent multipoles are expressed as a column matrix given below

where x represents the pseudo bond angle or the torsional angle. In eq 3, N is the number of Gaussian functions used for fitting the potential, and am, bm, and cm are the parameters determined by fitting the potential to the experimental distribution of the corresponding angles. To correctly describe the valence-related dynamics of the two-bead model, a total of 7811 refined X-ray crystal structures from the protein data bank (PDB)55 with resolution less than 0.2 nm and the R factor less than 0.2 were selected and analyzed to generate the statistical distributions of pseudo bonds, angles, and torsions. Then potentials of mean force (PMF) were calculated via the Boltzmann conversion method31,56,57 by the following equation: U = −KBT ln(P)

εii ·εjj

(5)

where cij represents the effective minimum distance between two approaching CG particles, εij is the strength of their interaction, and rij is the distance between CG sites i and j. For the heterogeneous CG pair, the following Lorentz−Berthelot mixing rules were adopted: 6149

DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156

Article

Journal of Chemical Theory and Computation

from 0 to 300 K through a series of simulations in 1 ns. After a 2 ns NPT equilibrium at 300 K and 1 bar pressure with isotropic position scaling, MD simulations were carried out up to 100 ns for six proteins except 1ANK and 50 ns for the 1ANK protein at 300 K and 1 bar pressure. In these AA simulations, bonds involving hydrogen are constrained using the SHAKE algorithm. Specific information about these systems in present MD simulations is listed in Table 3. In these AA and CG simulations, the temperature was scaled by the Langevin thermostat method and the pressure was scaled by the Berendsen barostat scheme for constant pressure dynamics. Initial velocities of atoms or pseudo beads were generated randomly at the beginning of the simulation. All time steps are set as 2 fs and cutoff values as 12 Å during AA and CG simulations. Besides, in both AA and CG simulations, the periodic boundary condition was adopted to avoid the solvent boundary artifact and electrostatic multipole interactions were treated by the Ewald summation method69 using a 64 × 64 × 64 grid for the discrete fast Fourier transform. Our study indicates that the effect of quadrupole interactions is relatively small and insignificant, and thus, we dropped the quadrupole term in the present study. The MD time step used in the present TMFF simulation is 2 fs, and the upper limit of the time step in the TMFF simulation is around 6 fs, and beyond that, the dynamic results of proteins in the TMFF simulation become unreliable and agree badly with the AA simulation.

be updated from local permanent multipoles during the TMFF simulation each time as the positions of the CG beads change. The multipole interactions between sites i and j can be expressed by the matrix formalism48,49 in the following equation: Ueleperm(rij) = MiT TijMj

(9)

where rji is the distance between CG sites i and j. As described by the AMOEBA FF,48−50 the multipoles are defined with respect to a local reference frame during MD simulation. Here, Mi and Mj are the global multipoles transformed from TMFF local multipoles. The tensor term Tij is given by the following equation 1

∂ ∂xj

∂ ∂yj

∂ ∂zj



∂ ∂2 ∂2 ∂2 ⋯ ∂xi ∂xi ∂xj ∂xi ∂yj ∂xi ∂zj Tij =

∂2 ∂yi ∂xj

∂2 ∂yi ∂yj

∂2 ⋯ ∂yi ∂zj

∂2 ∂ ∂zi ∂zi ∂xj

∂2 ∂zi ∂yj

∂2 ⋯ ∂zi ∂zj

∂ ∂yi











⎛1⎞ ⎜⎜ ⎟⎟ ⎝ rji ⎠

3. RESULTS AND DISCUSSION 3.1. Parametrization of TMFF. Figure 3 shows the PMF profiles of pseudo bonds and pseudo angles of crystal structures calculated by eq 4. The obtained force constant for the backbone pseudo bond is approximately 325.63 kcal/mol, and that for the side-chain pseudo bonds is set to an averaged value of 303.50 kcal/mol. All values of b0 for side-chain pseudo bonds in eq 2 are derived from the minimum values of PMF profiles and are listed in Table 1. Most values in Table 1 are essentially consistent with those of Gu and co-workers.38 Also, the parameter b0 for the backbone pseudo bond is given an averaged value of 3.8 Å. From the PMF profiles in Figures 3(b) and 3(c), it is clear that the potential energies for both the backbone and side chain pseudo angles could not be correctly represented by a single harmonic potential, and thus, a series of Gaussian functions are employed to fit these pseudo angle potentials. The B−B−B PMF profile in Figure 3(b) shows two wells at, respectively, 90° and 120°, similar to that found in Bahar and Jernigan’s work.34 These two wells are associated with the helix and sheet structures, and the entire PMF profile is accurately reproduced by fitting five Gaussian functions as shown in Figure 3(b). In this work, the parameters of various Gaussian functions for B− B−B, B−B−Si, and Si−B−B are listed in Tables S1−S3. The distributions of pseudo dihedral angles in CG models are quite different from those in the AA model. To tackle the complicated dihedral angle potentials for the backbone, Dewitte and Shakhnovich70 sorted 20 amino acids into three groups: the helix formers (GLU, ALA, LEU, MET, GLN, LYS, ARG, HIS, named H-group), the sheet formers (VAL, ILE, TYR, CYS, TRP, PHE, THR, named S-group), and turn formers (GLY, ASN, PRO, SER, ASP, named T-group). We adopted the same strategy and generated nine statistical distributions for B−Bp− Bq−B (p,q = H, S, T) from the crystal structure data and fitted them to eq 3. Figure 4(a) plotted four fitted PMF profiles, and these profiles all show two wells at, respectively, 50° and

(10)

For periodic systems, the electrostatic multipole interactions are treated by the Ewald summation method developed by Smith et al.59 and the parameters for Ewald summation are exactly the same as in the AA FF. 2.3. Molecular Dynamics Simulations. In this work, the AMOEBA AA FF48−50 (pmemd.amoeba scripts in Amber14 software60) was modified to adapt to applications using the TMFF. With little effort, the Martini single-point water model, which is described only by the vdW potential in eq 5, can be directly employed to couple with the TMFF. Here, the vdW parameters of the Martini single-point water model derive from Martini FF v2.2 (εii = 5.0 kJ/mol, cii = 0.47 nm in eq 5 for water−water interaction). In this work, a total of seven proteins (PDBID: 1D3Z,61 1FKS,62 3GB1,63 1CYE,64 1FW7,65 2AAS,66 1ANK67) were studied using the TMFF. In our CG simulations of these systems, the proteins were solvated with the Martini CG water model20 (every four water molecules are modeled by a single CG particle) in the cubic boxes. The systems were first minimized in which 1,000 cycles by the steepest descent method occurred and 4,000 cycles by the conjugate gradient method followed, and then they were heated from 0 to 300 K in 0.1 ns. After a 0.2 ns NPT equilibrium at 300 K with isotropic position scaling, multiple MD simulations (87 ns for six proteins except 1ANK and 25 ns for the 1ANK protein) were carried out at 300 K and 1 bar pressure with anisotropic pressure scaling. For the purpose of comparison, all-atom MD simulations for these systems were also performed by the Amber ff14SB FF (pmemd.cuda scripts in Amber14 software).54 In all-atom simulations, a cubic box filled with TIP3P water molecules68 was constructed to solvate these proteins. Initial systems were initially minimized in which 2,000 cycles by the steepest descent method occurred and 8,000 cycles by the conjugate gradient method followed, and then they were heated 6150

DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156

Article

Journal of Chemical Theory and Computation

Table 1. Equilibrium Bond Lengths of 19 Bi−Si Pseudo Bonds

a

i (Bi−Si)a

b0(Å)

i (Bi−Si)

b0(Å)

A C D E F H I K L M

1.601 2.371 2.513 2.797 3.405 3.183 2.205 3.203 2.663 2.983

N P Q R S T V W Y

2.532 1.921 2.781 4.016 1.973 1.998 2.003 3.821 3.843

i denotes amino acid.

Figure 3. Potential of mean force (PMF) profiles for (a) backbone pseudo bond (B−B), (b) backbone pseudo angle (B−B−B), and (c) two side-chain pseudo angles (B−BL−SL, SL−BL−B) for leucine and their corresponding fitting lines.

Figure 4. PMF profiles for (a) four kinds of B−Bp−Bq−B (H−H, S− T, T−S, T−T) dihedral angles, (b) two kinds of Si−B−B−B dihedral angles (i = A, P), and their corresponding fitting results.

−150°. These two wells are related to the α helix and β-sheet structures.70,71 Based on the shapes of the statistical distributions in Figure 4(b), 19 Si−B−B−B dihedral angles for side chains are represented by two distributions, i.e., SA−B−B−B (all amino acids except proline) and SP−B−B−B (only proline). The distributions of Si−B−B−B torsions are generally smoother but also have two wells that depend on the conformations of side chains. The B−Bp−Bq−B and Si-B−B−B torsions are sufficient to determine the torsional conformations for backbones and

side chains, and thus, other redundant dihedral angles, such as S−B−B-S and B−B−B--S, are not needed. All Gaussian parameters in eq 3 for pseudo dihedral angles are given in Tables S4−S5 in the Supporting Information. The vdW parameters are obtained by steering the homodimer of each amino acid along typical directions46 (such as side-by-side way, T-type way, and head-to-head way) and fitting the averaged interaction energy over these directions. We adopt the same vdW parameters for all the backbone CG particles (including Glycine) with Cii = 4.438 Å 6151

DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156

Article

Journal of Chemical Theory and Computation and εii = 1.269 kcal/mol, which are derived from fitting the Glycine homodimer. The parameters for side-chain CG beads obtained from the procedure described above are listed in Table 2. Figures 5(a) and 5(b) show two examples of deriving

anisotropic and the inclusion of electric multipole moments in the interaction energy is important to describing the anisotropic nature of the electrostatic interaction in the CG model. 3.2. Numerical Validation of TMFF. In order to examine the performance of the TMFF, CG MD simulations up to 87 ns were carried out for six small to medium proteins. These protein systems were recently used to test the FF by Wu’s group16 and Wang’s group.38 For comparison purposes, AA simulations of similar length were also carried out for these six proteins. The RMSD trajectories from CG simulations as a function of time are shown in Figure 6(a), which shows that most protein structures are stabilized after 10 ns MD.

Table 2. Cii and εii Parameters for CG Glycine and 19 SideChain CG Beads

a

ia

Cii (Å)

εii (kcal/mol)

i

Cii (Å)

εii (kcal/mol)

A C D E F G H I K L

3.528 4.107 4.077 4.447 5.213 4.438 5.651 5.483 6.081 6.224

0.370 0.550 1.189 1.432 1.543 1.269 1.177 0.766 0.884 0.645

M N P Q R S T V W Y

5.648 4.575 4.815 4.890 4.860 3.733 4.721 3.765 6.441 4.713

0.793 0.923 0.624 1.207 1.203 0.531 0.461 0.509 1.486 1.564

i denotes amino acid.

Figure 5. VdW potential profiles of the homodimers along different steering directions (side-by-side, T-type, or head-to-head), respectively, for (a) glycine (GLY) and (b) tryptophan (TRP). The black lines represent average vdW potentials. (c) and (d) show electrostatic profiles of the CG homodimer calculated by TMFF (solid lines) and AMOEBA AA FF (dash lines) along steering directions, respectively, for (c) glycine (GLY) and (d) tyrosine (TYR).

Figure 6. (a) Backbone root-mean-square deviation (RMSD) of six tested proteins from 87 ns CG MD simulations. (b) Backbone rootmean-square fluctuations (RMSFs) of six tested proteins from AA (black) and CG (red) MD simulations.

the vdW parameters for glycine and tryptophan. The vdW parameters for backbone and side-chain CG particles here are obtained by calculating the average values of vdW radius and interaction strengths over two different directions of approach. During the CG simulations by TMFF, the electrostatic interaction potentials between CG particles are explicitly described by point multipoles. For the glycine CG homodimer, the electrostatic interactions in the TMFF along both side-byside and T-shape directions are similar, as shown in Figure 5(c), and TMFF shows weaker anisotropy than the AMOEBA AA FF in describing the electrostatic interaction of glycine in this figure. But for residues other than glycine, such as the CG ACETYR-NME homodimer, the electrostatic multipole interaction from the TMFF is highly dependent on the direction of approach, as shown in Figure 5(d), and it shows good agreement with the AA results from the AMOEBA FF in this figure. This suggests that the electrostatic interaction is clearly

Table 3 lists backbone root-mean-square deviation (RMSD) results. As shown in Table 3, the use of the TMFF CG model reduces the number of particles in MD simulation by 11-fold from that of the AA model. Most RMSDs from CG simulations stabilize at 4−5 Å, which is somewhat larger than that from AA simulation, as somehow expected. The larger structural deviation in CG simulation could be attributed to two main reasons. First, the distortions of the original secondary/tertiary structures in our two-bead CG model are relatively easier than in the AA simulation due to the lack of secondary structurespecific interactions, such as hydrogen bonding etc. To remedy this, the Martini FF introduced extra harmonic interaction as residue-pair potentials to strengthen the secondary structures, but it also limits the transferability of the FF, as it requires a priori knowledge of the secondary structures of the protein 6152

DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156

Article

Journal of Chemical Theory and Computation Table 3. System-Specific Information of Six Tested Proteins and Adenylate Kinase PDB IDs

Protein Residue Number

AA Atom Numbera

CG Particle Numberb

Average AA RMSDc

Average CG RMSD

Scale Factor αd

1D3Z 1FKS 3GB1 1CYE 1FW7 2AAS 1ANK

76 107 56 129 110 124 214

15526 20179 12081 20387 19859 18388 46832

1398 1790 1085 1849 1773 1701 4155

2.5 1.6 1.0 1.4 1.5 3.5 2.8

4.8 5.5 3.7 5.4 6.3 5.0 4.2

3.081 4.435 11.481 5.319 3.005 4.651 3.199

a

AA atom number includes the protein atoms and all TIP3P water molecules in the atom-level system. bCG particle number includes the CG protein molecule and Martini CG water in the CG system. cIn calculating both AA and CG backbone RMSDs, the crystal structure is the reference frame. The RMSD unit is Å. dThe factor α in eq 11 minimizes the difference of RMSF between the AA simulation and the CG simulation.

Figure 7. (a) Backbone RMSDs of 1ANK from, respectively, 25 ns CG and 50 ns AA simulations. (b) Backbone RMSFs of 1ANK from AA (black) and CG (red) simulations. The yellow bars denote NMP and LID domains, respectively. Backbone conformational changes from the crystal structure of two important passages in LID and NMP domains in (c) the AA simulation (blue) and (d) the CG simulation (red).

where χ2 is the variance, α is the scale factor, and n represents Cα number or residue number. Fi,AA and Fi,CG represent the RMSF value of the ith Cα or residue from, respectively, the AA simulation and the CG simulation. The values of α are obtained from the standard least-squares procedure in eq 11 and are listed in the last column of Table 3. As shown in Figure 6(b), most systems during CG simulations display very similar fluctuation shapes to those obtained from AA simulations, especially for 1FKS62 and 2AAS.66 These results suggest that TMFF is able to reproduce the fluctuation dynamics of proteins quite well. We noted that the preservation of RMSD is correlated with the preservation of the secondary structures. The TMFF model

system. Second, the Martini water model used in our TMFF model does not have electrostatic interaction, which is responsible for a less accurate description of the local structures of the protein. In order to compare the fluctuation dynamics between the CG model and the AA model in a sensible way, RMSFs from CG simulations are scaled up by a constant factor α in order to minimize the variance χ2 between two RMSFs, as is done below, n

χ2 =

∑ (Fi ,AA − αFi ,CG)2 i

(α > 0) (11) 6153

DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156

Article

Journal of Chemical Theory and Computation

and 1ANK,67 by performing both TMFF and AA simulations. In these two comparison simulations, the proteins are placed in the center of a cubic water box with a distance of 15 Å away from the edge of the box. In the AA simulation, we adopted a reduced AMOEBA FF (pmemd.amoeba scripts in Amber14 software60) which includes three regular valence terms, a vdW term, and a permanent-multipole term. In the TMFF simulation, we employed the Martini CG water model and modified pmemd.amoeba scripts in Amber14 software for the TMFF. After the minimization and equilibrium, the CPU costs of 100,000 MD steps by TMFF and AA FF for two systems were recorded using the canonical NPT ensemble at 300 K and 1 bar pressure. The MD time steps used in the AA and CG simulations are 2 fs, and all induced terms in the electrostatic interaction are not included in both simulations. All tests were carried out in a machine with 8 * Intel(R) Xeon(R) CPU E5620@ 2.40 GHz and with printing out once every 500 MD steps. As shown in Table 4, the computational cost by the AA

generally preserves the secondary structures better, thus generating a smaller RMSD. We do find, however, that the agreement of RMSF tends to be somehow anticorrelated with that of RMSD. For example, the comparisons of RMSD and RMSF between the TMFF simulation using the electric multipole model (Figure 6 3GB1 system) and CG simulation using the simple point-charge model (Figure S1 in Supporting Information) for the 3GB1 system indicate this. One possible explanation is that the fitted torsional parameters are somewhat biased toward preserving secondary structures but less optimized for the loop regions. For example, the larger errors in RMSF generally occur in the loop regions. Thus, further work is needed to optimize the torsional parameters for better representation of the torsional structures. 3.3. Conformational Changes of Adenylate Kinase. In this work, we applied the TMFF to study the conformational dynamics of adenylate kinase, which has been extensively used by others to test various CG models. Protein 1ANK is an enzyme with 214 amino acid residues which plays an important role in the signal transduction process that catalyzes the conversion of adenosine triphosphate (ATP) to adenosine diphosphate. Protein 1ANK is usually divided into three functional regions:72,73 the ATP-binding domain (LID domain, residue 127−164), the nucleotide monophosphate-bind domain (NMP domain, residue 31−60), and the Core domain (residues 1−30, 61−126, and 165−214). During the conformation switching between closed state and open state, the LID domain and NMP domain act as the door roles. Here, the closed-state crystal structure (PDBID: 1ANK)67 of adenylate kinase is used as the initial structure in this simulation using TMFF. The CG MD simulation was carried out up to 25 ns while the comparative AA simulation was carried out up to 50 ns. The RMSDs obtained from both CG and AA simulations are listed in Table 3 (bottom row). As expected, the backbone RMSD of the CG simulation is somewhat larger than that of the AA simulation (by about 1 Å). Our result shows that the TMFF is able to stabilize the native structure of the protein during the CG simulation, as shown in Figure 7(a). By analysis of residue-level RMSFs between the CG simulation and the AA simulation, the fluctuation dynamics of two important passages, i.e., the NMP domain and the LID domain, is correctly captured at the CG level, as shown in Figure 7(b). We also studied the conformational changes between these two domains of 1ANK in both AA and CG simulations. The averaged backbone structures of two domains from simulations compared with the corresponding crystal structures are shown in Figure 7(c) (AA simulation) and 7(d) (CG simulation). Figure 7(c) shows small conformational changes at both LID and NMP domains along the closed−open path during the AA simulation. For comparison, the CG simulation shows relatively larger conformational changes of these two domains, as shown in Figure 7(d). The larger fluctuation of these domains observed in the CG simulation is made possible because the free-energy profile is smoother in the CG model, which facilitates the transition between domains. Of course, we should note that the time in the CG simulation is not the real time of the protein movement but accelerated in time scale. 3.4. Efficiency of TMFF. The obvious advantage of the CG model is to improve the computational efficiency. The CG model significantly decreases the number of degrees of freedom and smoothes the potential profiles and largely accelerates sampling. We rigorously tested the computational efficiency of two protein systems, JAC from the benchmark in Amber soft60

Table 4. CPU Costs in TMFF and AA FF Simulationsa System

Particle Number

Force Field

CPU (min)

JAC JAC 1ANK 1ANK

3199 35450 4229 46832

TMFF AAb TMFF AAb

48 1808 59 3539

a The MD time step is 2 fs. bAA FF employed AMOEBA force field, which includes regular bond, regular angle, regular torsion, van der Waals, and permanent-multipole (dipole and quadruple) terms. Other terms are ignored to ensure the same expression as the TMFF.

FF for both systems is about 50 times more expensive than those by the TMFF. Actually, the CG simulation can sample much more than the AA simulation in the same MD steps and accelerate the folding, docking, and other dynamic processes.

4. CONCLUSION In conclusion, we developed a new two-bead FF by introducing electrostatic multipole interactions and reparametrizing other terms, noted as the TMFF. Six proteins were tested to prove the validation of the TMFF, and RMSFs obtained from CG simulations by the TMFF show good agreement with those from AA MD simulations. Then application of the 1ANK system suggests the TMFF is able to capture the conformational changes of important domains well during conformation switching and is potentially extended to explore the conformational dynamics of huge protein systems. Through testing the computational efficiency of the MD simulation for JAC and 1ANK systems, the TMFF can save computational cost by about 50-fold when compared to an atomic-level MD simulation with multipole interactions. Although the TMFF shows promise for great improvement compared to the twobead CG model, more developments are needed. For example, the Martini nonpolarizable CG water model21,22 neglects electrostatic interaction of water molecules. Recently, the newly proposed polarizable CG water models by Marrink26 and Cui74 seem to be promising and could be coupled with the current TMFF to provide more realistic electrostatic interaction in the protein system. In addition, how to deal with the stability of secondary structures is another challenge for the TMFF. Currently, addressing these issues is in progress. 6154

DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156

Article

Journal of Chemical Theory and Computation



(25) de Jong, D. H.; Singh, G.; Bennett, W. F.; Arnarez, C.; Wassenaar, T. A.; Schäfer, L. V.; Periole, X.; Tieleman, D. P.; Marrink, S. J. J. Chem. Theory Comput. 2013, 9, 687−697. (26) Yesylevskyy, S. O.; Schafer, L. V.; Sengupta, D.; Marrink, S. J. PLoS Comput. Biol. 2010, 6, e1000810. (27) Zavadlav, J.; Melo, M. N.; Marrink, S. J.; Praprotnik, M. J. Chem. Phys. 2015, 142, 244118. (28) Lopez, C. A.; Bellesia, G.; Redondo, A.; Langan, P.; Chundawat, S. P. S.; Dale, B. E.; Marrink, S. J.; Gnanakaran, S. J. Phys. Chem. B 2015, 119, 465−473. (29) Periole, X.; Cavalli, M.; Marrink, S. J.; Ceruso, M. A. J. Chem. Theory Comput. 2009, 5, 2531−2543. (30) Basdevant, N.; Daniel Borgis, A.; Haduong, T. J. Phys. Chem. B 2007, 111, 9390−9399. (31) Ha-Duong, T. J. Chem. Theory Comput. 2010, 6, 761−773. (32) Lamoureux, G.; Roux, B. J. Chem. Phys. 2003, 119, 3025−3039. (33) Lamoureux, G.; Mackerell, A. D.; Roux, B. T. J. Chem. Phys. 2003, 119, 5185−5197. (34) Bahar, I.; Jernigan, R. L. J. Mol. Biol. 1997, 266, 195−214. (35) Kurt, N.; Haliloglu, T.; Schiffer, C. A. Biophys. J. 2003, 85, 853− 863. (36) Mukherjee, A.; Bagchi, B. J. Chem. Phys. 2004, 120, 1602−1612. (37) Mukherjee, A.; Bagchi, B. J. Chem. Phys. 2003, 118, 4733−4747. (38) Gu, J.; Bai, F.; Li, H.; Wang, X. Int. J. Mol. Sci. 2012, 13, 14451− 14469. (39) Shih, A. Y.; Arkhipov, A.; Freddolino, P. L.; Schulten, K. J. Phys. Chem. B 2006, 110, 3674−3684. (40) Zhou, J.; Thorpe, I. F.; Izvekov, S.; Voth, G. A. Biophys. J. 2007, 92, 4289−4303. (41) Levitt, M. J. Mol. Biol. 1976, 104, 59−107. (42) Liwo, A.; Pincus, M. R.; Wawak, R. J.; Rackovsky, S.; Oldziej, S.; Scheraga, H. A. J. Comput. Chem. 1997, 18, 874−887. (43) Golubkov, P. A.; Ren, P. Y. J. Chem. Phys. 2006, 125, 064103. (44) Golubkov, P. A.; Wu, J. C.; Ren, P. Y. Phys. Chem. Chem. Phys. 2008, 10, 2050−2057. (45) Wu, J.; Zhen, X.; Shen, H.; Li, G. H.; Ren, P. Y. J. Chem. Phys. 2011, 135, 155104. (46) Shen, H.; Li, Y.; Ren, P. Y.; Zhang, D.; Li, G. H. J. Chem. Theory Comput. 2014, 10, 731−750. (47) Shen, H.; Li, Y.; Xu, P.; Li, X.; Chu, H.; Zhang, D.; Li, G. J. Comput. Chem. 2015, 36, 1103−1113. (48) Ren, P. Y.; Ponder, J. W. J. Phys. Chem. B 2003, 107, 5933− 5947. (49) Ponder, J. W.; Wu, C.; Ren, P. Y.; Pande, V. S.; Chodera, J. D.; Schnieders, M. J.; Haque, I.; Mobley, D. L.; Lambrecht, D. S.; DiStasio, R. A., Jr.; Head-Gordon, M.; Clark, G. N.; Johnson, M. E.; HeadGordon, T. J. Phys. Chem. B 2010, 114, 2549−2564. (50) Shi, Y.; Xia, Z.; Zhang, J.; Best, R.; Wu, C.; Ponder, J. W.; Ren, P. Y. J. Chem. Theory Comput. 2013, 9, 4046−4063. (51) Spiga, E.; Alemani, D.; Degiacomi, M. T.; Cascella, M.; Peraro, M. D. J. Chem. Theory Comput. 2013, 9, 3515−3526. (52) Alemani, D.; Collu, F.; Cascella, M.; Dal, P. M. J. Chem. Theory Comput. 2010, 6, 315−324. (53) Cascella, M.; Neri, M. A.; Carloni, P.; Peraro, M. D. J. Chem. Theory Comput. 2008, 4, 1378−1385. (54) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Proteins: Struct., Funct., Genet. 2006, 65, 712−725. (55) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res. 2000, 28, 235−242. (56) Reith, D.; Putz, M.; Muller-Plathe, F. J. Comput. Chem. 2003, 24, 1624−1636. (57) Tozzini, V.; McCammon, J. A. Chem. Phys. Lett. 2005, 413, 123−128. (58) Xia, Z.; Gardner, D. P.; Gutell, R. R.; Ren, P. J. Phys. Chem. B 2010, 114, 13497−13506. (59) Smith, W. CCP5 Newsletter 1998, 46, 18. (60) Case, D. A.; Berryman, J. T.; Betz, R. M.; Cerutti, D. S.; Cheatham, T. E., III; Darden, T. A.; Duke, R. E.; Giese, T. J.; Gohlke,

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.6b00769. Parameters for the B−B−B−B torsion, parameters for B−Bi−Si angles, parameters for Si−Bi−B angles, parameters for B−Bp−Bq−-B torsions, parameters for Si−B− B−B torsions, CG multipoles for amino acids, and RMSD and RMSF for 3GB1 (PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (Grant no. 21433004), Ministry of Science and Technology of China (Grant no. 2016YFA0501700), and Shanghai Putuo District (Grant 2014-A-02).



REFERENCES

(1) van Gunsteren, W. F.; Dolenc, J. Biochem. Soc. Trans. 2008, 36, 11−15. (2) Muller-Plathe, F. ChemPhysChem 2002, 3, 754−769. (3) Kolinski, A.; Skolnick, J. Polymer 2004, 45, 511−524. (4) Tozzini, V. Curr. Opin. Struct. Biol. 2005, 15, 144−150. (5) Clementi, C. Curr. Opin. Struct. Biol. 2008, 18, 10−15. (6) Ayton, G. S.; Voth, G. A. Curr. Opin. Struct. Biol. 2009, 19, 138− 144. (7) Sherwood, P.; Brooks, B. R.; Sansom, M. S. Curr. Opin. Struct. Biol. 2008, 18, 630−640. (8) Peter, C.; Kremer, K. Soft Matter 2009, 5, 4357−4366. (9) Trylska, J. J. Phys.: Condens. Matter 2010, 22, 453101. (10) Kamerlin, S. C. L.; Vicatos, S.; Dryga, A.; Warshel, A. Annu. Rev. Phys. Chem. 2011, 62, 41−64. (11) Riniker, S.; Allison, J. R.; van Gunsteren, W. F. Phys. Chem. Chem. Phys. 2012, 14, 12423−12430. (12) Saunders, M. G.; Voth, G. A. Curr. Opin. Struct. Biol. 2012, 22, 144−150. (13) Takada, S. Curr. Opin. Struct. Biol. 2012, 22, 130−137. (14) Noid, W. G. J. Chem. Phys. 2013, 139, 090901. (15) Han, W.; Wan, C. K.; Wu, Y. D. J. Chem. Theory Comput. 2010, 6, 3390−3402. (16) Han, W.; Wan, C. K.; Jiang, F.; Wu, Y. D. J. Chem. Theory Comput. 2010, 6, 3373−3389. (17) Izvekov, S.; Voth, G. A. J. Phys. Chem. B 2005, 109, 2469−2473. (18) Izvekov, S.; Voth, G. A. J. Chem. Phys. 2005, 123, 134105. (19) Li, W. F.; Takada, S. J. Chem. Phys. 2009, 130, 214108. (20) Monticelli, L.; Kandasamy, S. K.; Periole, X.; Larson, R. G.; Tieleman, D. P.; Marrink, S. J. J. Chem. Theory Comput. 2008, 4, 819− 834. (21) Marrink, S. J.; Risselada, H. J.; Yefimov, S.; Tieleman, D. P.; de Vries, A. H. J. Phys. Chem. B 2007, 111, 7812−7824. (22) Marrink, S. J.; de Vries, A. H.; Mark, A. E. J. Phys. Chem. B 2004, 108, 750−760. (23) Marrink, S. J.; Tieleman, D. P. Chem. Soc. Rev. 2013, 42, 6801− 6822. (24) Bulacu, M.; Goga, N.; Zhao, W.; Rossi, G.; Monticelli, L.; Periole, X.; Tieleman, D. P.; Marrink, S. J. J. Chem. Theory Comput. 2013, 9, 3282−3292. 6155

DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156

Article

Journal of Chemical Theory and Computation H.; Goetz, A. W.; Homeyer, N.; Izadi, S.; Janowski, P.; Kaus, J.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Li, P.; Luchko, T.; Luo, R.; Madej, B.; Merz, K. M.; Monard, G.; Needham, P.; Nguyen, H.; Nguyen, H. T.; Omelyan, I.; Onufriev, A.; Roe, D. R.; Roitberg, A.; Salomon-Ferrer, R.; Simmerling, C. L.; Smith, W.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; York, D. M.; Kollman, P. A. AMBER 2015; Amber, Inc.: University of California, San Francisco, 2015. (61) Cornilescu, G.; Marquardt, J. L.; Ottiger, M.; Bax, A. J. Am. Chem. Soc. 1998, 120, 6836−6837. (62) Michnick, S. W.; Rosen, M. K.; Wandless, T. J.; Karplus, M.; Schreiber, S. L. Science 1991, 252, 836−839. (63) Kuszewski, J.; Gronenborn, A. M.; Clore, G. M. J. Am. Chem. Soc. 1999, 121, 2337−2338. (64) Santoro, J.; Bruix, M.; Pascual, J.; López, E.; Serrano, L.; Rico, M. J. Mol. Biol. 1995, 247, 717−725. (65) Reibarkh, M. Y.; Vasilieva, L. I.; Schulga, A. A., Kirpichnikov, M. P., Arseniev, A. S. Refined Solution Structure and Backbone Dynamics of 15N-labeled Barnase Studied by NMR (in press). (66) Santoro, J.; Gonzalez, C.; Bruix, M.; Neira, J. L.; Nieto, J. L.; Herranz, J.; Rico, M. J. Mol. Biol. 1993, 229, 722−734. (67) Berry, M. B.; Meador, B.; Bilderback, T.; Liang, P.; Glaser, M.; Phillips, G. N., Jr. Proteins: Struct., Funct., Genet. 1994, 19, 183−198. (68) Tan, C.; Tan, Y. H.; Luo, R. J. Phys. Chem. B 2007, 111, 12263− 12274. (69) Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089−10092. (70) DeWitte, R. S.; Shakhnovich, E. I. Protein Sci. 1994, 3, 1570− 1581. (71) Bahar, I.; Kaplan, M.; Jernigan, R. L. Proteins: Struct., Funct., Genet. 1997, 29, 292−308. (72) Lu, Q.; Wang, J. J. Am. Chem. Soc. 2008, 130, 4772−4783. (73) Li, M.; Xu, W. X.; Zhang, J. Z. H.; Xia, F. J. Mol. Model. 2014, 20, 2530. (74) Wu, Z.; Cui, Q.; Yethiraj, A. J. Phys. Chem. B 2010, 114, 10524− 10529.

6156

DOI: 10.1021/acs.jctc.6b00769 J. Chem. Theory Comput. 2016, 12, 6147−6156