Article pubs.acs.org/JPCB
Recognition Mechanism between Lac Repressor and DNA with Correlation Network Analysis Lishi Xu,† Wei Ye,† Cheng Jiang,† Jingxu Yang,† Jinmai Zhang,† Yan Feng,*,‡ Ray Luo,*,§ and Hai-Feng Chen*,†,∥ †
State Key Laboratory of Microbial Metabolism, Department of Bioinformatics and Biostatistics, College of Life Sciences and Biotechnology, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai, 200240, China ‡ State Key Laboratory of Microbial Metabolism, Department of Biotechnology, College of Life Sciences and Biotechnology, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai, 200240, China § Departments of Molecular Biology and Biochemistry, Chemical Engineering and Materials Science, Biomedical Engineering, University of California, Irvine, Irvine, California 92697-3900, United States ∥ Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Shanghai, 200235, China S Supporting Information *
ABSTRACT: Lac repressor is a DNA-binding protein which inhibits the expression of a series of genes involved in lactose metabolism. Lac repressor can bind at a random DNA site via nonspecific interactions; then, it rapidly translocates through the double chain of DNA until it finds the specific binding site. Therefore, the site transform between these two modes is essential for the specific recognition between Lac repressor and DNA. Here, the recognition mechanism between Lac repressor and DNA was illustrated with molecular dynamics simulations and correlation network analyses. We have found that the correlation network of the specific system (2KEI) is more centralized and denser than that of the nonspecific system (1OSL). The significant difference in the networks between the nonspecific and specific systems is apparently due to the different binding modes. Then, different interaction modes were found where electrostatic and hydrogen bonding interactions in the nonspecific system are stronger than those in the specific system. Hydrophobic interactions were found only in specific complexes and mostly focused on the hinge helices. Furthermore, the hinge helix will induce the bending of DNA for the specific system. At the same time, a common specific sequence of DNA was revealed for three specific systems. Then, two design systems (positive and control) were used to evaluate the specific recognition between DNA and Lac repressor. These combined methods can be used to reveal the recognition mechanism between other transcription factors and DNA.
■
Lac repressor from the nonspecific site to the specific one.9 However, the recognition mechanism between the nonspecific and specific systems is still unsolved. In order to reveal the molecular mechanism, these questions need to be answered: (1) What is the driven force for specific recognition? (2) If it exists a common specific DNA sequence during Lac repressor recognition? (3) How to confirm the common specific sequence of DNA? To answer these questions, the specific and nonspecific complexes of Lac repressor and DNA were used in this study. The complex consisted of four monomers and two molecules of DNA; that is, each dimer binds with one DNA (shown in Figure 1).10−14 In order to simplify the model, we just used the dimer system in the future work. The recognition complex
INTRODUCTION Protein−nucleic acid interactions are responsible for the regulation of fundamental biological functions, such as transcription, translation, replication, and recombination. To perform these functions, DNA-binding proteins need to search their specific binding site from a large amount of nonspecific sequences. It has been demonstrated that proteins can find their target sequence at much faster rates than those of random diffusion.1−4 The previous works have supported that there are two binding modes for the recognition process between DNA and protein. At first, protein binds at a random DNA site via nonspecific interactions; then, it rapidly translocates through the double chain of DNA, until it finds the specific binding site.5−7 The complex between Lac repressor and DNA is an ideal and prototypical system to study the protein−DNA recognition because this DNA-binding protein has been structurally solved in their nonspecific and specific states.8 Therefore, there is some research about the transformation of © XXXX American Chemical Society
Received: October 31, 2014 Revised: January 28, 2015
A
DOI: 10.1021/jp510940w J. Phys. Chem. B XXXX, XXX, XXX−XXX
Article
The Journal of Physical Chemistry B
and one right monomer mutant based on 2KEI. In order to evaluate the common specific sequence of DNA, we designed and built one positive complex between DNA with a common specific sequence and Lac repressor from a specific system and one control complex between DNA from a nonspecific system and Lac repressor from a specific system. Two design systems included one design specific system (Design-SP) and one design nonspecific system (Design-NP). HP62 V52C mutant was used in this study, because it can bind to DNA with an affinity similar to that of the intact Lac repressor and form stable complexes with DNA in both binding modes.17 Mutants of Y17F were conducted with PyMOL, and the structure of the designed DNA sequence was generated using the Amber NAB package. All simulations and analysis procedures were conducted using the AMBER12 software package.18 Hydrogen atoms were added using the LEaP module of AMBER12. Counterions were used to maintain system neutrality. All systems were solvated in a truncated octahedron box of TIP3P waters with a buffer of 10 Å. Particle mesh Ewald (PME)19 was employed to treat long-range electrostatic interactions with the default setting in AMBER12. The ff99SB force field was used to model wild types and mutants. In order to compare the performace of different force fields, ff99bsc0 was also used in this study for two systems.20 Bonds involving hydrogen atoms were constrained using the SHAKE algorithm.21 2000-step steepest descent minimization was performed to relieve any structural clash in the solvated systems and 20000-step was lengthened for mutant, followed by heating up and brief equilibration for 20 ps in the NPT ensemble at 298 K with pmemd.cuda of AMBER12. Langevin dynamics with a time step of 2 fs were used in the heating and equilibration runs with a friction constant of 1 ps−1. Simulations were done at 298 K for specific and nonspecific systems with 100 ns for each system. A total of 1.3 μs trajectories was collected for these solvated systems, taking about 1560 GPU hours. Detailed simulation conditions are listed in Table 1. Data Analysis. Tertiary contact assignment was completed with in-house software.22−25 Interaction with charged residues and nucleotides was considered when the distance between the mass center of positive charged residues and the DNA phosphate backbone was less than 11 Å. Hydrophobic residues and nucleotides were in hydrophobic contact when mass centers of their side chains were closer than 6.5 Å for the complex. A hydrogen bond was defined that the distance between two polar heavy atoms either with a hydrogen atom
Figure 1. Complex of Lac repressor and DNA: (A) tetramer Lac repressor and DNA; (B) structure of HP62.
consisted of the two monomers, and each monomer included the first 62 residues of the N-terminal (headpiece 62, HP62).15 The HP62 fragment of Lac repressor consists of helix α1 from residues 5 to 14, α2 from residues 16 to 25, and α3 from residues 31 to 45. These helices form a helix-turn-helix domain for DNA binding. Residues 47−62 at the C-terminal are disordered. Residues 50−58 will fold into the α helix, which is named the hinge helix, upon binding to the specific sequence of DNA.12,14 In this study, we compared the dynamic features of nonspecific and specific systems with all-atom molecular dynamics (MD) simulations. On the basis of the ensemble of trajectories, we constructed dynamics correlation networks for nonspecific and specific systems of Lac repressor and DNA complex. On the basis of the analyses and comparisons of the correlation networks between these complexes, a hypothesis was proposed that the similar structures and different binding modes were linked with different correlation networks; e.g., the binding between the hinge region of Lac repressor and DNA could perturb the correlation network of protein, and change the function of the protein, i.e., their binding affinities to DNA.
■
MATERIALS AND METHODS Molecular Dynamics Simulations. Four wild type (WT), five mutants, and two design systems were used in this study. Four WT systems were derived from the PDB; other systems were built based on WT. Four WT systems included one nonspecific system (pdb code: 1OSL)8 and three specific systems (pdb code: 2KEI, 2KEK, and 1CJG).16 Five mutants consisted of four left monomer mutants corresponding to WT Table 1. Simulation Conditions for All Systems system
force field
WT
ff99BSC0 ff99SB
mutant
design
system
simulation time (ns)
1OSL 2KEI 1OSL 2KEI 2KEK 1CJG 1OSL_Y17F_Left 2KEI_Y17F_Left 2KEK_Y17F_Left 2KEK_Y17F_Right 1CJG_Y17F_Left Design-SP Design-NP
100 100 100 100 100 100 100 100 100 100 100 100 100 B
ions 30 39 30 39 39 38 30 39 39 40 38 38 30
+
Na Na+ Na+ Na+ Na+ Na+ Na+ Na+ Na+ Na+ Na+ Na+ Na+
water
water box size (Å3)
11227 18991 11227 18991 22418 16546 11036 17954 23963 22410 16327 17162 12127
543938 871887 543938 871887 1016834 776557 418760 604926 839096 790474 597313 624869 474003 DOI: 10.1021/jp510940w J. Phys. Chem. B XXXX, XXX, XXX−XXX
Article
The Journal of Physical Chemistry B
Figure 2. Structural variation for four wild types: (A) Cα RMSD vs simulation time of four wild type systems; (B) Cα RMSF vs simulation time of four wild type systems.
connected to a given node, was used to indicate the importance of the node. After the network construction completed, network topological analyses were performed using Cytoscape3.1.1.36 Similar to the definition of topological parameters in a social network,37 network centralization means how equally the central nodes distribute. The network with higher centralization is more central on a few important nodes. Density means the proportion of real edges out of all possible edges, which indicates how easily the information could transfer in the network. Higher heterogeneity indicates the network is more likely to consist of different components. The shortest path between any two nodes was calculated using the Floyd−Warshall algorithm.38 For community analyses, the Girvan−Newman algorithm39 was utilized with the network tools developed by the Luthey-Schulten group.35,39 The detail analysis step for the correlation network was as follows: The input data were the conformers of nonspecific and specific systems at room temperature. Covariance matrices were first used to mark the correlation between all residues and nucleotides. Next, Cytoscape3.1.1 was used to build the correlation network. Then, the Floyd−Warshall algorithm was used to search the shortest pathway. Finally, the Girvan− Newman algorithm was used to handle community analysis. Global Multidimensional Scaling (MDS) Analysis. MDS40 was used to reduce the dimensionality of the data by minimizing the Sammon stress function through a steepest descent procedure. The RMSD between any two conformers (i and j) along the 298 K trajectories was defined as Dij. Supposing N conformers are extracted from the trajectories, the conformers are mapped as N points (xi, yi) on a twodimensional plane. The Euclidean distance of any two points is defined as dij = [(xi − xj)2 + (yi − yj)2]1/2. This allows the distances between any two points to be proportional with the corresponding RMSD values. In order to reduce dimensionality, the Sammon stress function was used to optimize the mapping between the N structures and the N points by minimizing the error E as defined as
was less than 3.5 Å. All the 3D molecular representations were visualized and rendered with PyMOL (http://pymol.org), and the 2D map was plotted with Origin 9.2. The DNA bending angle and width of the major/minor groove were calculated using ptraj of AMBER12.26 Landscapes were used on distance variations to detect the structural adjustment between WT and mutant, as well as design the specific and nonspecific complex. The distances between every pair of residue and base were calculated in every system, respectively. Then, the landscapes of residue-to-base distances in the mutant structure minus the corresponding distances in WT structures were plotted with Origin 9.2. Principle Component Analysis. The transformation and movement between specific and nonspecific binding were crucial for the protein to locate its target through thousands of DNA sequences. We employed principal component analysis (PCA)27−30 to discover the principal movement of these two binding modes. The PTRAJ module of AMBER12 was used to extract eigenvalues and corresponding eigenvectors from a covariance matrix and to calculate the contribution of each principle component. Three-dimensional structure snapshots along each principle component were projected with in-house software. The final results were visualized using PyMOL with 10 snapshots along each principal component. Correlation Network Analysis. Correlations between all pairs of residues and nucleotides in 2KEI, 1OSL, and other complexes were calculated from the covariance matrix31−34 Cij =
⎯r (t )⟩ ⟨Δ ri(⃗ t ) ·Δ→ j 2 → (⟨Δ ri(⃗ t )⟩ ⟨Δ ⎯rj(t )⟩2 )
where Δri⃗ (t) = ri⃗ (t) − ⟨ri⃗ (t)⟩, ⟨·⟩ means the time averaging, and ri⃗ (t) means the position of node i at time point t. In this study, we constructed the correlation-based networks 35 using covariance matrices along the last 50 ns in each trajectory, with 40 ps per snapshot. Every amino acid is defined as one node, and every nucleotide is defined as two nodes with the nucleotide backbone as one and the base group as the other. Besides nodes, an edge is defined between any two nodes without a covalent bond but with heavy atoms closer than 4.5 Å over 75% sampling time. The strength of the edge is defined as the absolute value of the intranode correlation (Cij) between the two nodes. The number of the edges for each node is defined as the degree of the node. The correlation-weighted degree, which is the summation of strengths of all the edges
E=
N
1 N
N
∑i = l ∑i < j Dij
N
∑∑ i=l i