Subscriber access provided by Northern Illinois University
Article
Estimation of Relative Protein-RNA Binding Strengths from Fluctuations in the Bound State Zhaleh Ghaemi, Irisbel Guzman, Jung-un Julia Baek, Martin Gruebele, and Zaida Luthey-Schulten J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.6b00418 • Publication Date (Web): 16 Aug 2016 Downloaded from http://pubs.acs.org on August 19, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
Estimation of Relative Protein-RNA Binding Strengths from Fluctuations in the Bound State Zhaleh Ghaemi,† Irisbel Guzman,‡ Jung-un Julia Baek ,† Martin Gruebele,∗,¶,†,§,k and Zaida Luthey-Schulten∗,¶,†,§,k †Department of Chemistry ‡Department of Biochemistry ¶Department of Physics §Center for the Physics of Living Cells kCenter for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign E-mail:
[email protected];
[email protected] Abstract Protein-RNA complexes are increasingly important in our understanding of cell signaling, metabolism, and transcription. Electrostatic interactions play dominant role in stabilizing such complexes. Using conventional computational approaches, very long simulations of both bound and unbound states are required to obtain accurate estimates of complex dissociation constants (Kd ). Here, we derive a simple formula that offers an alternative approach based on the theory of fluctuations. Our method extracts a strong correlate to experimental Kd values using short molecular dynamics simulations of the bound complex only. To test our method, we compared the computed relative
1
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Kd values to our experimentally measured values for the U1A-Stem Loop 2 (SL2) RNA complex, which is one of the most-studied protein-RNA complexes. Additionally we also included several experimental values from the literature, to enlarge the data set. We obtain a correlation of r= 0.93 between theoretical and measured estimates of Kd values of the mutated U1A protein-RNA complexes relative to the wild type dissociation constant. The proposed method increases the efficiency of relative Kd values estimation for multiple protein mutants, allowing its applicability to protein engineering projects.
1
Introduction
Quantifying computationally how specific amino acid residues in protein-RNA complexes affect the free energy of binding is a challenging problem. The difficulties arise because of the presence of long range electrostatic interactions results in the formation of complexes with high affinities that are very sensitive to specific charged side chains. RNA flexibility compounds the problem, requiring flexible rather than rigid docking algorithms to yield meaningful equilibrium constants, Kd , for complex dissociation. Several experimental techniques are available to measure dissociation constants as a function of protein mutation. 1,2 Experiments give important information about the strength of binding, but they are costly, and the details of interactions of the binding partners remain elusive. 3 Molecular simulations, such as molecular dynamics (MD) and Monte Carlo simulations
4,5
are valuable tools that can shed light on these details. While there are several
advanced sampling techniques that can provide binding free energies, 6–8 they require a careful choice of reaction coordinates that best describe the bound and unbound states and relatively long simulations to reach meaningful convergence of free energy differences. So a key question is whether one can use shorter MD simulations of a single reference state to provide a fast, yet accurate estimate for relative Kd values for protein-RNA complexes with various protein mutants. Here, we propose a theoretical approach for estimation of relative Kd values based on the 2
ACS Paragon Plus Environment
Page 2 of 22
Page 3 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
theory of fluctuations 9 that calculates the dominant electrostatic effects 10 for protein-RNA complexes. Specifically, the change in the free energy of binding for mutations resulting in a change of positive charge on the protein is estimated. Atomistic and force fluctuations already have been used for allosteric pathway detection. 11,12 Especially, important cooperative networks within a protein-RNA complex was revealed by atomistic fluctuations. 13,14 We hypothesize that by quantifying changes in the atomistic fluctuations within the protein in complex with the RNA molecule, it is possible to estimate free energy differences upon protein mutation (relative Kd values). This approach is essentially different from the two commonly used methods; namely, molecular docking, and directly evaluating the free energy differences between the bound and unbound states. Instead, the idea underlying our approach is that the atomistic fluctuations of the protein, quantified by correlations in the bound complex carry sufficient information about its binding strength to the binding RNA. To implement this idea, we derived an analytical formula that relates the binding strengths of mutated complexes (relative to a wild type complex) to the intramolecular correlations. The correlations of several complexes with charge-mutated proteins were computed by running short MD simulation trajectories, and the results were then validated against experimental values. As a model system we chose the U1A protein, an RNA Recognition Motif (RRM), in complex with Stem Loop 2 (SL2) RNA. The U1A protein is a component of the U1 small nuclear ribonucleoprotein subunit of the spliceosome, 15,16 and its N terminus binds with high affinity to SL2 RNA. 17–19 The RRM is one of the most common RNA binding domain in eukaryotes. 20 This complex is the paradigm of protein-RNA binding complexes, 21,22 and offers a large set of mutagenesis experimental data in the literature. To enlarge the data set further, we experimentally measured Kd values for several more U1A mutants by an electrophoretic mobility shift assay (EMSA). 2
3
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
2
Page 4 of 22
Theoretical derivation
The relative Kd values and changes in the binding free energies, ∆∆G, caused by chargemutation of a series of lysine to glutamine residues can be determined as illustrated in Figure 1. Using the mutation site as a reference point for force calculations, we estimate the interactions between atoms (i, j, ...) in the protein and RNA. Electrostatic force is considered as the main interaction between the two binding partners. This approximation is motivated by the highly charged nature of the RNA molecule. An electrostatic force Fi is generated on atom i by the charges of all the atoms on the side chain of the mutated residue. This force varies before and after the mutation (dFi ). It has been shown by Callen 23 that the position variation of atom j in response to a change of force on atom i is determined by the covariance matrix for the fluctuations of atoms i and j from their (ensemble) averaged positions δri = ri − hri i: hδri · δrj i = −RT
∂r j
∂Fi
T,P
(1)
T, P subscripts indicate the system is at constant temperature and pressure. For simplicity, we dropped the notation for the isothermal-isobaric ensemble in the rest of the derivation and rewrite equation 1 as ∆Fi (j) ≈ −RT hδri · δrj i−1 ∆rj
(2)
where hδri · δrj i−1 is the inverse of the covariance matrix and j is called the "observation point" for the calculations (Figure 1). In SI, the details of the covariance matrix form and evaluation are discussed . Because, mutations primarily change the electrostatic force, we substitute the potential energy by the free energy, 10 ∆∆Gi = −∆Fi · ∆ri to get:
∆∆Gi (j) ≈ RT (hδri · δrj i−1 ∆rj ) · ∆ri
4
ACS Paragon Plus Environment
(3)
Page 5 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
∆∆Gi (j), is the change in the free energy of atom i due to the applied force. We sum the contributions from all the atoms i (n), to obtain the total change in free energy:
∆∆G(j) =
n X
∆∆Gi (j) ≈ RT
i
n X
(hδri · δrj i−1 ∆rj ) · ∆ri
(4)
i
Relating the free energy difference to Kd values we finally get: m
n
1 XX Kd W T (hδri · δrj i−1 ∆rj ) · ∆ri . )≈ ln(Kd rel ) = ln( Kd mut m j i
(5)
where Kd W T and Kd mut are the dissociation constants for the wild type (WT) and mutant (mut) protein-RNA complexes, respectively and Kd rel is their relative value. Equation 1 shows that we observe the position variation of atom j under the effect of a mutation. Thus, equations 3 and 4 show dependence on the "observation point" (j). Since no atom "j" is unique to be picked as a reference for this calculation, a possible robust approach is to introduce an averaging scheme for "observation point" (j) (equation 5). The averaging is performed over a subset of j values (m < n), that is determined empirically. This is explained in greater details in the Results section. Intuitively, equation 5 states that the binding strength of the complex, can be determined by the atomistic fluctuations and conformational changes of the bound complex alone. This is exactly true only when interactions within the complex are assumed to be dominated by electrostatic interactions. To asses the validity of equation 5, we computed the Kd values for several positive charge mutants of U1A protein in complex with RNA by performing MD simulations, and compared the results with literature U1A-RNA Kd values, as well as several Kd values newly measured by EMSA in this work to enlarge the data set.
5
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 1: An illustration of the concepts of equations 1-5: the force difference on atoms of the protein (shown as a violet cloud) before and after mutation (∆F1 , ∆F2 , ∆F3 ,..., ∆Fi ) together with the "observation point" (atom j); where the position variation in response to the change of force on atom i is observed.
3 3.1 3.1.1
Materials and Methods Experimental procedures Protein expression and purification
Mutagenesis studies were performed on the 102-amino acid N-terminal U1A fragment, which is the only domain of this protein that binds to the SL2 RNA with high affinity. 17 To enable sensitive fluorescence detection of the U1A protein, we based all our mutants on the F56W mutant as the pseudo wild type. The original phenylalanine and the mutated tryptophan interact with the RNA A6 nucleotide, (see Figure 2), yet F56W retains a high RNA binding affinity. 18,19 The mutated sites were four lysine residues at positions 20, 22, 23 and 50 that have electrostatic interactions with SL2 RNA, but do not form hydrogen bonds with the RNA directly according to the crystal structure. 24 The N-terminal domain of U1A expression vector, was obtained from Nagai 25 and contained a sequence coding for a hexaHis tag at the N-terminus of the protein to facilitate purification. We generated K20Q/F56W, K22Q/F56W, K23Q/F56W, and K50Q/F56W mutants by site-directed mutagenesis and confirmed by sequencing. The expression vectors were transformed into Escherichia coli
6
ACS Paragon Plus Environment
Page 6 of 22
Page 7 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
strain BL21DE3 (pLysS) competent cells. The cells were grown in LB medium, and protein expression was induced with 1 mM IPTG at OD600 = 0.60. The cultures were grown for 5-6 hours after induction. The cells were pelleted, resuspended in 10 mL of lysis buffer (50 mM N aH2 P O4 , 300 mM NaCl, 20 mM imidazole, pH 7.6) and lysed by ultrasonication. The lysate was centrifuged at 10,000 rpm, the supernatant was loaded on a 1 mL Ni-NTA column, and the protein was eluted with elution buffer (50 mM N aH2 P O4 , 300 mM NaCl, 250 mM imidazole, pH 7.6). Eluted protein was dialyzed against storage buffer (10 mM potassium phosphate, 50 mM KCl) and concentrated by amicon filter MWCO 3,000. The concentration of each protein was determined using a bicinchoninic acid (BCA) assay (ThermoScientific). The histidine tag of all U1A mutants was removed by thrombin cleavage. The molecular weights of the purified proteins were confirmed by low-resolution electrospray ionization mass spectrometry and the purities of the proteins were assessed by SDS-PAGE. 3.1.2
Electrophoretic mobility shift assay (EMSA)
Varying amounts of U1A protein were incubated with 50 pM [γ−32 P] ATP-labeled RNA for 30 minutes at room temperature in a buffer containing 10 mM potassium phosphate (pH ◦
7.4) and 200 mM KCl. Electrophoreses were carried out at 25 C for 40 minutes at 350 V with 0.5X TBE as running buffer. The native gels, 8% acrylamide (42:1 = acrylamide: biacrylamide, 15 cm × 40 cm × 1.5 mm), were pre-run at 350 V for 30 minutes before loading 10 µl of reaction mixture per well. The temperature of the gel was maintained ◦
at 25 C by a circulating water bath. Gels were visualized on a Molecular Dynamics Storm phosphorimager. Fraction RNA bound versus protein concentrations were plotted and curves were fitted to the equation: Fraction bound = 1/(1+Kd /[P ]T ) using KaleidaGraph (Synergy Software, PA). All binding measurements were performed with a greater than 10-fold excess of protein over RNA in each binding reaction used to determine the Kd so that [P] would be approximately equal to [P ]T .
7
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 2: A) The U1A protein in complex with the RNA molecule, colored in violet. The mutated protein residues are shown in licorice representation. The SL2 RNA forms important base-stacking interactions with the protein, including W56-A6 interaction. B) The RNA sequence and C) the U1A protein sequence which were used in experiments and simulation studies with the mutated residues colored in red.
3.2 3.2.1
Computational modeling Construction of the computational model systems
The structure of the U1A-SL2 RNA complex is available based on X-ray crystallographic 24 and NMR spectroscopic 26 data. While the crystal structure contains an incomplete Cterminal of U1A protein (only residues 2-98), the NMR structures have the full U1A fragment (residues 2-102), which was also used in our simulations and experiments. Therefore NMR structures were used to construct the protein model. We used multiseq 27 to align the NMR
8
ACS Paragon Plus Environment
Page 8 of 22
Page 9 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
models and model 1 was chosen. It has the minimal root mean square deviation (RMSD) from the rest of the structures. Residue M1 was added to the protein with the xLEaP tool of the Amber program, 28 and F56W, R7Q, K20Q/F56W, K22Q/F56W, K23Q/F56W, K50Q/F56W, K60Q and R70Q mutants were generated by psfgen of NAMD2 29 program for comparison with our data and Laird-Offringa et al. 17 Single mutants are referenced to the wild type; double mutants are referenced to F56W as the pseudo-wild type. The SL2 RNA structure from the crystal structure 24 was added to complete the built of U1A-SL2 RNA complex system. To simulate the experimental SL2 sequence, A1U and U20G mutations were performed using psfgen. 29 Furthermore, sequences of CC and GGG were added to the 3′ and 5′ ends, respectively, using the RNA composer program. 30 The first solvation layers were added with the program Solvate, 31 and the Solvate plugin of VMD 32 was used to solvate the rest of the box with TIP3P water molecules. 33 For F56W, K20Q/F56W, K22Q/F56W, K23Q/F56W, and K50Q/F56W mutants, after neutralizing the systems by addition of K + ions using Ionize, 34 0.2 M KCl was added to the water box to reproduce our experimental condition. For wild type, R7Q, K60Q and R70Q mutants, the systems were neutralized with N a+ ions, and to reproduce the relevant experimental condition, 17 a concentration of 0.15 M NaCl was added. CHARMM36 force fields 35,36 were used for protein and RNA and the determination of the protonation states was done with program PropKa. 37 A stepwise minimization and equilibration were performed as suggested by Eargle et al. 38 with an NPT equilibration of 5 ns. Production runs used periodic boundary condition in the NPT ensemble using a NoséHoover thermostat 39,40 with a coupling constant of 1 ps, and the pressure was controlled with Parrinello-Rahman barostat 41 with a time coupling constant of 1 ps. Hydrogen bonds were constrained to their equilibrium values by LINCS, 42 Lennard-Jones interactions were cut off at a distance of 1.2 nm and the time step was 1 fs. Long-range electrostatics was computed by particle-mesh Ewald algorithm. 43 All simulations were performed with the Gromacs 4.6 package. 44
9
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
3.2.2
Page 10 of 22
Molecular dynamics simulations
We simulated U1A protein in wild type, R7Q, K60Q and R70Q, as well as F56W, K20Q/F56W, K22Q/F56W, K23Q/F56W, and K50Q/F56W mutants in complex with SL2 RNA. To get better statistics on the estimated relative binding affinities, each system was simulated with three 100-nanosecond long replicas. Root mean square deviations (RMSDs) of the residues 8 to 98 of protein backbone and RNA backbone from the equilibrated structure were plotted versus time. Based on them, the last 50 ns of each simulation was used for the analysis. For notational simplicity, the F56W mutations of the double mutants are not listed explicitly in the rest of the manuscript.
4 4.1
Results Experimental results
In order to validate our derivation, we used literature data, 17 and measured additional dissociation constants of a pseudo wild type (F56W) and four mutated U1A proteins based on the pseudo wild type using electrophoretic mobility shift assay (EMSA). 2 For decades this technique have been used to calculate the Kd of protein-DNA and protein-RNA complexes. During this experiment the Stem Loop (SL)-2 RNA was labeled with [γ−32 P] ATP and different U1A concentration were added into each reaction and loaded into a native gel. When the protein concentration was sufficient to detect binding an upper shift in the electrophoretic band is observed (see Methods for details). Four lysine residues, K20, K22, K23 and K50 that have electrostatic interaction with RNA, but according to the crystal structure 24 do not form hydrogen bonds with the RNA were chosen for mutation. Wild type U1A protein has a charge of +7 and the lysine residues were mutated to glutamine to conserve the relative size of the side chain while eliminating the positive charge. The F56W mutation was previously reported to have minimal effect on the binding affinity 18,19 and was
10
ACS Paragon Plus Environment
Page 11 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
successfully used to monitor the dissociation of U1A-SL2 RNA complex. 45 ◦
The results of our experiments show that at 25 C all of the Lys to Gln mutations decreased the binding affinity compared to the F56W mutant (see Figure 3 and supplementary Figure S1). The mutation of K22 and K50 each decreased the binding affinity more than 14-fold. K20Q and K23Q had less significant effects on the binding affinities, with approximately 3 and 6-folds decrease, respectively. Similar trend was previously observed using surface plasmon resonance, under different experimental conditions. 17
Figure 3: a) Native gel and b) binding curve result of K22Q mutant. Binding curves for other U1A mutants are presented in supplementary Figure S1.
To increase the number of experimental Kd values, we also added three more mutants (R7Q, K60Q and R70Q) which were measured by Laird-Offringa et al. at different salt concentration and without the F56W mutation, 17 thus providing an independent data set. These mutations are located far from the RNA binding surface and are predicted to have smaller effects. The Kd values are listed in Table 1 and Figure 2-A shows the position of all mutants in the protein-RNA complex.
11
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Table 1: Experimental Kd values for various U1A mutants. Protein
200 mM KCl (this work)
F56W
2.21 (± 0.09) ×10−8
K20Q
7.4 (± 0.5) ×10−8
K23Q
1.32 (± 0.04) ×10−7
K22Q
3.28 (± 0.06) ×10−7
K50Q
3.92 (± 0.05) ×10−7 150 mM NaCl (ref. 17 )
Wild type
4.0 (± 0.2) ×10−11
R7Q
4.2
5 (± 1) ×10−11
K60Q
3.3 (± 0.3) ×10−11
R70Q
3.4 (± 0.3) ×10−11
Theoretical estimation of relative dissociation constants
We performed a total of three 100 nanosecond-long MD simulations for the wild type and each U1A mutant in complex with RNA under the same conditions as their respective experiments (see Methods for details). The expression in equation 5 was evaluated for each mutant based on the obtained trajectories. The length of the simulations was sufficient to get converged values for ln(Kd rel ) as shown in supplementary Figure S2. Using the program Carma, 46 we aligned the trajectory frames and estimated the covariance matrix considering a coarsegrained representation of the system as protein Cα and RNA phosphate atoms of backbone and nitrogen atoms of the nucleotide bases. The ∆ri,j in equation 5 are the differences between the positions of atoms before and after the mutation. Because of the dependency of these quantities on different conformations of the protein and RNA, we clustered the simulation trajectories in the space of root mean square deviation (RMSD) with respect to the equilibrated structure. We identified three different clusters: most populated, maximum of RMSD (max RMSD) and minimum of RMSD (min RMSD) that are shown in supplementary Figure S3. For these clusters, we evaluated ln(Kd rel ) values by summing i over protein Cα 12
ACS Paragon Plus Environment
Page 12 of 22
Page 13 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
atoms and averaging "observation point" (j) over the secondary structure residues containing the mutated residue. In Figure 4, we plotted the calculated relative Kd values for the most populated cluster alongside experimental results. Good correlation (Pearson r = 0.93) between them suggests the validity of our hypothesis: that fluctuations of the bound state, without knowledge of the unbound state free energy, are sufficient to obtain relative Kd values. The ln(Kd rel ) values for each mutant are reported in Table 2. Using the multiple replica trajectories, we estimated the standard errors of mean values of the theoretical estimates. Table 2 also shows the ratio of the experimental and theoretical Kd rel values. The largest ratio occurs for the K20Q mutant complex which is a factor of 3.8 in Kd rel . For this mutant, in the majority of simulation replicas, a crucial protein helix, helix C (shown in Figure 2-A) changes its conformation. This is mainly caused by the formation of extra salt bridges, present for about 30% of analyzed simulation time, between the lysine residues on the helix C (K96 and K98) and the backbone phosphate group of U8, as shown in Figure 5. This result points out that in addition to the electrostatics, our main focus here, the protein conformation does play an important role in determining the binding affinity. Table 2: Theoretical and experimental ln(Kd rel ) values for the most populated clusters for each mutant with their respective standard error of mean in parentheses. Third column shows the ratio of the experimental and theoretical Kd rel values. Protein Theoretical ln(Kd rel ) value Experimental ln(Kd rel ) value Kd rel,exp /Kd rel,theory K20Q
-2.55 (± 0.550)
-1.208 (± 0.079)
3.827
K23Q
-1.838 (± 0.467)
-1.787 (± 0.051)
1.052
K22Q
-2.607 (± 0.249)
-2.697 (± 0.045)
0.914
K50Q
-2.922 (± 0.281)
-2.876 (± 0.043)
1.047
R7Q
-0.677 (± 0.058)
-0.223 (± 0.206)
1.575
K60Q
-0.395 (± 0.091)
0.192 (± 0.104)
1.798
R70Q
-0.497 (± 0.017)
0.163 (± 0.101)
1.935
Figure 6 shows the theoretical ln(Kd rel ) values estimated from the max RMSD and min 13
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 4: Comparison of Kd rel values from theory calculated for the most populated clusters and the experiment. Error bars on the theoretical values are standard errors of mean values and were evaluated on three estimates. RMSD clusters, together with the most populated cluster values. The ln(Kd rel ) values from these clusters are mostly consistent with the most populated cluster estimates and are within the error of this cluster values. Thus far, we have established the plausibility of equation 5 for relating the relative binding affinities to covariance matrices. This equation is based on the key idea that the correlations of position fluctuations within a complex have signatures of binding. As a further test of this statement, we calculated the difference between the correlation matrices (CKXQ ) of complexes with various binding constants. Specifically, we calculated the Pearson correlation d mut d mut d mut matrices for K50Q ( K = 17.74), R70Q ( K = 0.85) and R7Q ( K = 1.13, slightly Kd W T Kd W T Kd W T
larger Kd than R70Q) complexes. We used program Carma 46 and followed the same protocol for estimations of the covariance matrices. We then estimated the differences between the correlation matrices, namely, CK50Q − CR70Q and CR7Q − CR70Q and counted the maximum difference values and an interval of 0.4 from the maximum. The result of this analysis is shown in Figure 7. It clearly indicates that the change of
14
ACS Paragon Plus Environment
Page 14 of 22
Page 15 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
Figure 5: A) helix C of K20Q mutant complex, shown in tube representation and colored in magenta, changed its conformation with respect to the average F56W structure in blue color. This change caused by the formation of extra salt bridges between the K96 (panel B) and K98 (panel C) residues located on helix C and the backbone phosphate group of U8 of RNA loop. the K50Q correlation matrix is more populated than R7Q (both with respect to the R70Q complex). In other words, the correlation matrix of the weakest binder (K50Q) changed more significantly with respect to the strongest binder (R70Q) than a similarly strong binder (R7Q). The latter trend holds for the entire complex as well as for the U1A protein alone. This finding further supports our hypothesis that the correlated motions within protein-RNA complexes are closely tied to binding strengths of the complexes, and therefore metrics of such correlated motion can be used to estimate relative binding affinities (or equilibrium dissociation constants). Overall, our calculations provide reasonable estimates for ln(Kd rel ) values compared to experimental results, with an average difference of 0.46 on natural log scale for the tested set of protein-RNA complexes having a maximum error of 25% (for K23Q mutant complex) which demonstrates the accuracy of our calculations.
5
Discussion and Conclusion
Using the theory of fluctuations, 23 we derived an expression that relates the correlations within a mutated protein-RNA complex to the change in binding free energy relative to the reference complex (either wild type or pseudo wild type, F56W). The major assumption in the model was that electrostatic interactions dominated, thus one can relate mechanical 15
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 6: Bar graphs showing the experimental and theoretical ln(Kd rel ) values for all clusters, which were determined by the RMSD of each mutant with respect to its equilibrated structure. force and free energy. The approach was tested for a series of protein-RNA complexes using experiments and MD simulations. We showed that the correlated motions within the protein are modulated in accordance with its binding affinity to RNA. Therefore, the correlation changes have signatures of binding strengths. This result affirms our hypothesis that protein fluctuations within the proteinRNA complex alone, without knowledge of the unbound protein + RNA free energy, can be used to estimate relative dissociation constants and hence complex stabilities. Our approach provides a fast method for estimating binding free energies, in comparison with other methods that calculate the free energy difference directly. 6,7 Unlike these methods, our approach does not yield all the details of the binding mechanism because it focuses on equilibrium fluctuations of the bound system only. For now, the proposed approach is limited to cases where binding is dominated by electrostatic interaction changes and such changes affect the intramolecular fluctuations. The approach was tested for two different experiments on multiple mutants of one model proteinRNA complex system, albeit one of the most abundant complexes in eukaryotes. The general applicability of our method to other protein-RNA complexes will require further exploration, 16
ACS Paragon Plus Environment
Page 16 of 22
Page 17 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
Figure 7: Counts of differences between correlation matrices of CK50Q − CR70Q and CR7Q − CR70Q and within 0.4 of the maximum values. The CK50Q matrix changes at many more positions than CR7Q with respect to CR70Q indicating that the intramolecular correlations of the weakest binder change more significantly with respect to the strongest binder. This effect exits both for the entire protein-RNA complexes and proteins. as more complete experimental data sets become available. Our expression for relative binding affinity estimation relies on only the covariance matrix and the averaged positions of the atoms, making it a general approach. Because, the evaluation of equation 5 is not limited to atomistic MD simulation trajectories, it can be applied to data derived from other sampling and coarse-grained techniques as long as the sampling is sufficient to provide reliable ensemble averages. In addition, the force-free energy relation is often accurate even for non-electrostatic systems due to enthalpy-entropy compensation. 47 Hence, the proposed method can be applied to evaluate relative Kd values for large biomolecular complexes for which estimating binding affinities for several mutants is experimentally challenging.
Acknowledgement This work was supported by the National Science Foundation [Grant MCB-1244570 to Z.G. and Z.L-S], the National Institutes of Health [Grant 2R01-093318 to M.G. and I.G.], the
17
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
National Science Foundation [Graduate Research Fellowship DGE-1144245 to I.G.] and the National Science Foundation [Grant MCB-0843728 to J.U.B and Anne M. Baranger]. Supercomputer time was provided by XSEDE [TG-MCA03S027].
Supporting Information Available Electrophoretic mobility shift assay results for all mutated systems, the convergence of ln(Kd rel ) values, the specific form of covariance matrix, details of the cluster analysis and variations of the ln(Kd rel ) values for different observation points. This material is available free of charge via the Internet at http://pubs.acs.org/.
References (1) Pollard, T. D. Mol. Biol. Cell 2010, 21, 4061–4067. (2) Ryder, S. P.; Recht, M. I.; Williamson, J. R. Methods Mol. Biol. 2008, 488, 99–115. (3) Pan, A. C.; Borhani, D. W.; Dror, R. O.; Shaw, D. E. Drug Discovery Today 2013, 18, 667 – 673. (4) Frenkel, D.; Smit, B. In Understanding Molecular Simulation, 2nd ed.; Frenkel, D., Smit, B., Eds.; Academic Press: San Diego, 2002; pp 23–107. (5) Prigozhin, M. B.; Gruebele, M. Phys. Chem. Chem. Phys. 2013, 15, 3372–3388. (6) Laio, A.; Parrinello, M. Proc. Natl. Acad. Sci. 2002, 99, 12562–12566. (7) Marinari, E.; Parisi, G. Europhys. Lett. 1992, 19, 451. (8) Darve, E.; Rodríguez-Gómez, D.; Pohorille, A. J. Chem. Phys. 2008, 128, 144120–13. (9) Greene, R. F.; Callen, H. B. Phys. Rev. 1951, 83, 1231–1235.
18
ACS Paragon Plus Environment
Page 18 of 22
Page 19 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
(10) Warshel, A.; Russell, S. T. Q. Rev. Biophys. 1984, 17, 283–422. (11) Sethi, A.; Eargle, J.; Black, A. a.; Luthey-Schulten, Z. Proc. Natl. Acad. Sci. 2009, 106, 6620–5. (12) Stacklies, W.; Vega, M. C.; Wilmanns, M.; Gräter, F. PLoS Comput. Biol. 2009, 5, e1000306. (13) Kormos, B. L.; Baranger, A. M.; Beveridge, D. L. J. Am. Chem. Soc. 2006, 128, 8992–8993. (14) Kormos, B. L.; Baranger, A. M.; Beveridge, D. L. J. Struct. Biol. 2007, 157, 500 – 513. (15) Papasaikas, P.; Valcárcel, J. Trends in Biochem. Sci. 2016, 41, 33 – 45, Special Issue: 40 Years of TiBS. (16) Pomeranz Krummel, D. A.; Oubridge, C.; Leung, A. K. W.; Li, J.; Nagai, K. Nature 2009, 458, 475–480. (17) Law, M. J.; Linde, M. E.; Chambers, E. J.; Oubridge, C.; Katsamba, P. S.; Nilsson, L.; Haworth, I. S.; Laird-Offringa, I. A. Nucleic Acids Res. 2006, 34, 275–85. (18) Kranz, J. K.; Hall, K. B. J. Mol. Biol. 1999, 285, 215–231. (19) Shiels, J. C.; Tuite, J. B.; Nolan, S. J.; Baranger, A. M. Nucleic Acids Res. 2002, 30, 550–558. (20) Cléry, A.; Blatter, M.; Allain, F. H.-T. Curr. Opin. Struct. Biol. 2008, 18, 290 – 298. (21) Maris, C.; Dominguez, C.; Allain, F. H. FEBS J. 2005, 272, 2118–2131. (22) Allain, F. H.; Gubser, C. C.; Howe, P. W.; Nagai, K.; Neuhaus, D.; Varani, G. Nature 1996, 380, 646 – 650.
19
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(23) Callen, H. B. Thermodynamics and an Introduction to Thermostatistics, 2nd ed.; Wiley, 1985; Chapter 19, p 427. (24) Oubridge, C.; Ito, N.; Evans, P. R.; Teo, C.-H.; Nagai, K. Nature 1994, 372, 432–438. (25) Nagai, K.; Oubridge, C.; Jessen, T. H.; Li, J.; Evans, P. R. Nature 1990, 348, 515–520. (26) Allain, H.; Howe, P. W. A.; Neuhaus, D.; Varani, G. EMBO J. 1997, 16, 5764–5772. (27) Roberts, E.; Eargle, J.; Wright, D.; Luthey-Schulten, Z. BMC Bioinformatics 2006, 7, 382–393. (28) Case, D. A.; Darden, T. A.; Cheatham, T. E.; Simmerling, C. L.; Wang, J.; Duke, R. E.; Luo, R.; Walker, R. C.; Zhang, W.; Merz, K. M.; Roberts, B.; Hayik, S.; Roitberg, A.; Seabra, G.; Swails, J.; Goetz, A. W.; Kolossváry, I.; Wong, K. F.; Paesani, F.; Vanicek, J.; Wolf, R. M.; Liu, J.; Wu, X.; Brozell, S. R.; Steinbrecher, T.; Gohlke, H.; Cai, Q.; Ye, X.; Wang, J.; Hsieh, M. J.; Cui, G.; Roe, D. R.; Mathews, D. H.; Seetin, M. G.; Salomon-Ferrer, R.; Sagui, C.; Babin, V.; Luchko, T.; Gusarov, S.; Kovalenko, A.; Kollman, P. A. AMBER 12. 2012; http://ambermd.org/. (29) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kalé, L.; Schulten, K. J. Comput. Chem. 2005, 26, 1781– 1802. (30) Popenda, M.; Szachniuk, M.; Antczak, M.; Purzycka, K. J.; Lukasiak, P.; Bartol, N.; Blazewicz, J.; Adamiak, R. W. Nucleic Acids Res. 2012, (31) Grubmuller, H.; Groll, V. Solvate 1.0. http://www.mpibpc.mpg.de/grubmueller/ solvate, 1996, (accessed date 10/14/2015). (32) Humphrey, W.; Dalke, A.; Schulten, K. J. Mol. Graph. 1996, 14, 33–38. (33) Jorgensen, W. L.; Madura, J. D. J. Am. Chem. Soc. 1983, 105, 1407–1413. 20
ACS Paragon Plus Environment
Page 20 of 22
Page 21 of 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Theory and Computation
(34) Baraeff, A.; Roberts, E. Ionize 1.6.0. http://www.scs.illinois.edu/schulten/ software/mdtools/ionize/index.html, 2005 (accessed date 10/14/2015). (35) Best, R. B.; Zhu, X.; Shim, J.; Lopes, P. E. M.; Mittal, J.; Feig, M.; MacKerell, A. D. J. Chem. Theory Comput. 2012, 8, 3257–3273. (36) Denning, E. J.; Priyakumar, U. D.; Nilsson, L.; Mackerell, A. D. J. Chem. Theory Comput. 2011, 32, 1929–1943. (37) Rostkowski, M.; Olsson, M. H. M.; Sø ndergaard, C. R.; Jensen, J. H. BMC Struc. Biol. 2011, 11, 6. (38) Eargle, J.; Black, A. A.; Sethi, A.; Trabuco, L. G.; Luthey-Schulten, Z. J. Mol. Biol. 2008, 377, 1382 – 1405. (39) Nosé, S. The Journal of Chemical Physics 1984, 81, 511–519. (40) Hoover, W. G. Phys. Rev. A 1985, 31, 1695–1697. (41) Parrinello, M.; Rahman, A. J. Appl. Phys. 1981, 52, 7182–7190. (42) Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. J. Comput. Chem. 1997, 18, 1463–1472. (43) Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089–10092. (44) Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. J. Chem. Theory Comput. 2008, 4, 435–447. (45) Anunciado, D.; Dhar, A.; Gruebele, M.; Baranger, A. M. Journal of Molecular Biology 2011, 408, 896 – 908. (46) Glykos, N. M. Journal of Computational Chemistry 2006, 27, 1765–1768. (47) Liu, L.; Guo, Q.-X. Chem. Rev. 2001, 101, 673–696. 21
ACS Paragon Plus Environment
Journal of Chemical Theory and Computation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Graphical TOC Entry
22
ACS Paragon Plus Environment
Page 22 of 22