How the ligand-induced reorganization of protein internal energies is

internal energies is coupled to conformational events ...... IG 15420 and IG 20019; and from the Ministry of Foreign Affairs through the project. PERT...
0 downloads 0 Views 2MB Size
Subscriber access provided by University of Sunderland

Biomolecular Systems

How the ligand-induced reorganization of protein internal energies is coupled to conformational events Giulia Morra, Massimiliano Meli, and Giorgio Colombo J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00195 • Publication Date (Web): 03 Oct 2018 Downloaded from http://pubs.acs.org on October 4, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

How the ligand-induced reorganization of protein internal energies is coupled to conformational events

Giulia Morra1,*, Massimiliano Meli1, and Giorgio Colombo1,2*

1) Istituto di Chimica del Riconoscimento Molecolare, CNR Via Mario Bianco 9, 20131 Milano (Italy) 2) Dipartimento di Chimica, Università di Pavia, Via Taramelli 10, 27100 Pavia, Italy

*) Authors to whom correspondence should be addressed Email: [email protected]; [email protected] Email: [email protected] Tel: +39-02-28500031

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Here, we introduce a novel computational method to identify the protein substructures most likely to support the functionally-oriented structural deformations that occur upon ligand-binding. To this aim, we study of the modulation of protein energetics along the trajectory of a Molecular Dynamics simulation of different proteins in the presence and in the absence of their respective ligands, namely human FGF, human second PDZ from human PTP1E/PTPL1, and the N terminal domain of human Hsp90. The method is based on the idea that a subset of protein residues (hotspots) may initiate the global response via the disassembly and reassembly of interactions, which is reflected in the modulation of the overall protein energetics. To identify structural hotspots and dynamic states linked to the onset of functionally relevant conformational transitions, we define an energy profile to monitor the protein energetics, based on a previously introduced approach that highlights the essential non-bonded couplings among all residues. The energy profiles are calculated along the trajectory to yield a time dependent evolution and their relative population in the presence and absence of the ligand is evaluated by means of a clustering procedure. It is found that interconversion between clusters, as well as their population and the density of specific energy profiles in the vicinity of structural transitions, provides specific information on the impact of the ligand in driving the protein conformational response. This analysis also highlights the hotspot residues that are most responsive to the presence of the ligand. Importantly, identified hotspots are in agreement with experimental evidence in the three considered systems. We propose that this approach can be generally used in the prediction of ‘allosteric hotspots’ and ligand induced conformational responses, as well as to select conformations more likely to support functional transitions, e.g. in the framework of adaptive sampling approaches.

ACS Paragon Plus Environment

Page 2 of 32

Page 3 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Introduction The energy landscape theory assumes that a protein at equilibrium explores a set of conformational substates in the native thermodynamic basin1. This exploration consists of diffusive rearrangements within an ensemble of local energy minima, as well as quasi-harmonic motions around each minimum. Local minima are separated by free energy barriers of the order of KT, which generate the ruggedness of the native basin 24.

From the point of view of structural changes, these sub-global rearrangements can

involve the rotation of single side chains, typically on timescales of picoseconds, or local collective rearrangements of groups of residues such as loop motions, on longer timescales of nanoseconds. In contrast, functional dynamics that consist of breathing motions associated with enzymatic activity and allostery5,

6,

involve global

conformational changes, such as subdomain rigid rearrangements, which typically occur over longer time scales, from microseconds to seconds, and amount to overcoming higher barriers (several KT) to reach a different thermodynamic minimum basin7. Local and sub-local changes can be directly investigated by means of computational methods like Molecular Dynamics (MD), taking advantage of the all atom description to highlight structural determinants of the related biochemical mechanisms. In contrast large scale functional motions involved in enzyme catalysis or signal transduction, and the allosteric effects triggered by binding specific ligands, are often out of reach for most applications of this kind of approach, unless specific coarse grained methods,

8

bias potentials or adaptive sampling approaches are used to scale

down the computational effort while modeling large scale transitions9,10. However, every functional (large-scale) response to a binding event is ultimately initiated at the local level by the residues that are immediately surrounding and sensing the ligand. Therefore, an interesting question is whether it is possible to identify the subset of protein residues that preside the onset of functional motions induced by ligand binding through the analysis of their microscopic scale motions on the ps-ns scale. Such subset may include the residues that define the binding site and the ones that are distal and respond through allosteric mechanisms. In this work, we develop a computational

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 32

approach based on the analysis of internal energy distribution, and its time evolution, to expose the protein regions that control conformational modulation induced in response to ligand binding.

To this end we introduce the residue-based energy profile as a novel conformational variable to monitor the evolution of a MD trajectory of a protein in its native thermodynamic basin. The residue-based energy profile can be used to recapitulate the essential features of the energetic stability of a given protein conformation. More in detail, we previously showed that the main contributions governing the stability of a structure can be represented by the profile of essential interactions emerging from the Energy Decomposition Method (EDM)11. This representation highlights the main couplings underlying the stability of the conformational state: hotspots mostly contributing to the protein stability in a given structure are associated with peaks in the first eigenvector of the pair interaction matrix

12.

By defining the energy profile via

EDM for each conformational state visited along the MD trajectory, we identify protein regions, whose contribution to stabilization energy remains constant in time and differentiate them from regions that are modulated in response to a ligand, due to formation or disruption of local contacts. The energy profile is therefore hypothesized to work as a magnifying glass able to detect the regions that may undergo or control changes on the microscopic and nanosecond time scale, even in the absence of major structural transitions in the same range. The computational approach is applied to three model systems. They are: the human FGF

13,

in the absence and in the presence of the inhibitor sm27 14, the human second

PDZ from human PTP1E/PTPL1 in complex with RA-GEF2 peptide terminal domain of human Hsp90 in the presence of ATP

16

15

and the N

, or ADP, as well as in the

absence of any ligand. Structures from MD trajectories of the unbound systems are analyzed in terms of the similarities of their energy profiles. The same analysis is repeated on the trajectory in the presence of the ligand. The different energy states are then grouped into clusters. Population shifts within the different energy clusters, or emergence of new clusters, as well as interconversion among states in the presence of

ACS Paragon Plus Environment

Page 5 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the ligand, are analyzed in comparison to the unperturbed system. In parallel to the energy-based evaluation, the structural evolution of the trajectories is evaluated by means of a standard RMSD-based clustering procedure, to complement the information provided by the energy decomposition. We find that in general the same pool of energy states is populated in the apo and holo cases. However, the relative population and persistence of the energy states appears to be significantly modulated by the ligands. The highlighted energy-modulated regions are linked to experimental data, suggesting functional implications for the substructures that respond to the ligand. Interestingly, in proximity of conformational changes, the local density of energy profiles along the trajectory can deviate from the average, supporting the mechanistic role of specific structures in the structural deformations that lead to conformational transitions.

Methods

Molecular Dynamics Simulations. The proteins used in this work were: the Fibroblast Growth Factor 2 (FGF-2), the N terminal domain of HSP90 (NTD of HSP90), and the PDZ-2 domain. The apo structure of FGF-2 was obtained from the Protein Data Bank, code 1BLA

13,.

The starting structure for the sm27-FGF2 complex (HOLO system) was

previously modelled in our group in a previous work using the HADDOCK2.0 software 17

based on chemical shift perturbation data as described by the paper of Pagano et al

18.

The structure of the second PDZ domain (residues 1361−1456) from human PTP1E/PTPL1 in complex with RA-GEF2 peptide was obtained from the Protein Data Bank, code 3LNY

15

. The corresponding apo PDZ2 domain was the associated apo

structure, code 3LNX 15. Finally, the structure of N-terminal of HSP90 in complex with ATP was taken from PDB database, code 3T0Z

16

. The associated structure of the apo N terminal Hsp90

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

corresponds to code 3T0H16. The first eight residues were removed (lowering the number of residues from 213 to 205) due to the high flexibility of the terminal strand. The N-terminal HSP90 in complex with ADP was obtained by replacing the ligand from the crystal structure of the ATP complex.

All MD simulations and standard structural analyses were performed with the AMBER 12 suite of programs 19, using the ff12SB force field and the TIP3P water model and the CUDA implementation for GPUs. All systems were simulated using the same procedure: The systems simulated in apo or holo form are placed in a box large enough to keep at least 1.0 nm of water solvent in each direction. The simulation starts with an unrestrained minimization consisting of 2500 steps of steepest descent followed by 2500 steps of conjugate gradient minimization. The minimized systems are then equilibrated in two cycles at 300K: the first one by keeping the protein and the ligand under a restraint of 100 kcal/mol/A^2 for 10 ns, in the second equilibration the restraint is released for 10ns. All simulations were carried out in the NPT ensemble at 1 atm using the Berendsen coupling algorithms electrostatics

21.

The SHAKE algorithm

22

20,

with full particle-mesh Ewald

was used to constrain all covalent bonds

involving hydrogen atoms. A 2fs time step and a 10 Å cutoff were used for the truncation of VDW non-bonded interactions. The production runs amount to 500ns at 300K. (see Table 1 for details on each system). Energy Decomposition Analysis. From the molecular dynamics trajectories, carried out as described in the section above, 500 frames were extracted for every system, one every nanosecond, for a total of 3500 frames. The use of one snapshot every 1ns proves to be sufficiently accurate to sample the distribution of energy states during the 500ns of each trajectory, for all considered systems. In fact, neither the population of energy states nor the energy profiles defining the cluster centers significantly vary when increasing the sampling to 1 snapshot every 500ps or every 100ps (See Supplementary Figures S1 and S2).

ACS Paragon Plus Environment

Page 6 of 32

Page 7 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Before any energy calculation the 500 selected snapshots were minimized for 1000 step of steepest descent followed by 1000 steps of conjugate gradient. The energy calculation was then performed using a procedure developed in our group: the energy decomposition method, EDM11 12, 23. The EMD is based on the calculation of the interaction matrix Mij, which is determined by evaluating the inter-residue, non-bonded interaction energies (consisting of the van der Waals and coulombic terms) between residue pairs in a given protein conformation. For a protein of N residues, this calculation yields an N×N matrix of pair couplings Mij such that the total inter-residue non-bonded energy of the protein is given by the sum over the matrix entries. We showed that, after diagonalizing the matrix Mij, one can approximate pair couplings using the first eigenvalue lambda and eigenvector w: =  



   

 , 1



 ,

=  

 , 1

 

 1



≅  



 





 

 , 1



1

1 



1 

where the eigenvalues are ranked in increasing order starting from the most negative typically 2 ≫ 

one. The approximation to the first term of this sum is motivated by the fact that

{



1

(see ref.

11).

In the approximated expression, the first eigenvector

} reports on the single residue contributions to the essential stabilization energy 1

(see ref 12). The underlying assumption about excluding the intra-residue couplings

in the calculation is that they do not significantly depend on the protein tertiary conformation, while the inter-residue coupling energy is modulated by the structure. Hence, while non representing the total non-bonded energy of the protein, this term captures the essential structure-dependent modulation of the internal energy11,12.

Therefore, the components of the first eigenvector define a residue-based profile that is an approximated measure of the energy partition within the protein structure. Importantly, solvent effects are taken into account using the molecular mechanics (MM) Poisson-Boltzmann surface area (PBSA) method and a pair decomposition scheme, with a solvation term accounting for the shielding effect of the solvent (see ref. 24).

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 32

We calculated the first eigenvector for each snapshot taken every 1ns and obtained a time series of 500 eigenvectors representing the full trajectory. The time series was subjected to clustering analysis, as described in the following, to analyze the distribution of energy eigenvectors.

Root Mean Structure Deviation matrix. A modified version of the GROMACS tool g_rms

25

was used to generate the RMSD similarity matrix used for the RMSD

clustering. Here, each element represents the structural deviation between any two frames of the trajectory. The RMS values were calculated just on CA atoms of the trajectory resulting from the union of the apo and holo minimized set of frames for the same system, hence using 1000 snapshots for the calculation. For the NTD of HSP90 the concatenation of apo, ADP and ATP states yields 1500 frames. Each matrix element was calculated by using this formula:     

1,  2

=   1



 1

‖  (

1) −

  (

2 )‖

2

where n is the number of considered atoms and ri(t) is the position of atom i at time t.

Clustering analysis. The clustering analysis was done using the K-means algorithm PAM implemented in the R statistical computing package [https://www.Rproject.org/]. This algorithm groups the data by partitioning it around medoids that are defined iteratively

26

. We used the pamk function, based on the k-medoid

clustering method PAM, which estimates the optimal number of clusters by calculating the average silhouette width 27.

ACS Paragon Plus Environment

Page 9 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Number System

Ligand

water

of Total Number of atoms

molecules

HSP90

APO

8260

28013

ADP

8245

28019

ATP

8361

28372

APO

5715

24393

SM27

4590

23812

APO

4998

16390

RA-GEF2

5177

17014

FGF

PDZ2 Table 1. Summary of simulated systems

Results The three systems considered in this study, and simulated in the absence and in the presence of bound ligands, were chosen based on the requirement of being mono domain proteins, to facilitate the energetic analysis and at the same time test heterogeneous types of ligands. For each complex, a simulation of 500ns was generated as described in the Methods section; snapshots were extracted every 1ns, which proved to be a sufficiently dense sampling interval (see Methods and Supplementary Figures S1 and S2) then minimized before calculating the energy profile. The energy decomposition approach applied to each snapshot (see Methods) yielded a time dependent energy profile. The two time series of apo and holo energy profiles were concatenated into a single time series, which was subsequently subjected to cluster decomposition to highlight the most representative energy configurations (see Methods for details). For each cluster, the overall population was calculated, and the transition probabilities among clusters were also evaluated. The latter satisfy the

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

detailed balance, indicating that the cluster interconversion occurs at equilibrium over the simulated time. The persistence of each cluster, defined as the probability of staying in the same cluster at the subsequent step, was also calculated, and is shown to roughly correlate to the cluster population (see Supplementary Figure S3). In the following we set out to analyze in detail each system separately, addressing these questions: 1) What are the dominant energy profiles and where do differences arise? 2) How does the ligand affect the profile modulation and how does this relate to the molecular recognition and to triggering of functional motions? 3) What can be learned from the energy profile in terms of conformational dynamics of the protein, in comparison to a standard RMSD-based analysis?

ACS Paragon Plus Environment

Page 10 of 32

Page 11 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 1. Histograms showing the relative fractional population of the different energy clusters obtained by clustering decomposition of the concatenated trajectories in the three considered systems: top: apo and sm27-bound FGF, middle: apo and RA-GEF2 bound PDZ2 domain, bottom: apo, ATP, and ADP-bound N terminal domain of Hsp90.

Apo vs Inhibitor bound FGF The 500ns trajectories in the apo and holo forms of FGF were processed as described in Methods. The resulting time evolution of energy profiles for the concatenated trajectory was used to construct a similarity matrix and decomposed into clusters. The dominant energy profiles are populated in the apo and holo state as shown in Figure 1, with an optimal decomposition into three clusters. The number and nature of populated clusters is not affected by the ligand, as both the apo and the holo state occupy all three sets. However, relative to the apo state, the presence of the inhibitor significantly increases the population of cluster 2, which becomes dominant, with a concomitant decrease of population mainly of cluster 1. The kinetic transition matrix reflects the population change between holo and apo, In general, the interconversion rate between the states (see Supplementary Table S1) is also modulated by the ligand and follows the population trend. The profile of the representative clusters is shown in Figure 2B. Cluster 1 highlights the segment of residues 85 to 123 as main contributor to stability (see red area on the structural representation in Figure 2A). This region comprises a significant part of the hydrophobic core of the protein, characterized by hydrophobic contacts among aromatic residues (such as Phe102 to Phe104, highlighted as grey surface), as well as a solvent exposed region containing electrostatic interactions between residues Lys86, Glu87, Glu100 and Tyr120. This energy state is dominant in the absence of the inhibitor and partially overlapped to cluster 3 (Figure 2B). In contrast, the bound inhibitor strongly promotes the population and persistence of cluster 2 (see Figure 1), characterized by the stabilization of segment 36 to 60, comprising the turn 35-39, directly facing the ligand. (see Figure 2C and for the persistence, supporting information). The peaks of the representative eigenvector are indeed located few

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

residues away from the binding surface, and are determined by the electrostatic interaction between residues Arg42 and Asp50, which suggests an increased stability of the area. Cluster 3, whose peaks are partially overlapped to those of cluster 1, gains the relative stabilization of solvent exposed region 85-95 while reducing the contribution of core residues comprising segment 95-115. Therefore, we hypothesize that cluster 3 represents the energy state with a partially destabilized core. The different stability regions of cluster 1 and cluster 2, indicating the energy modulation, are reported on the structure with a color code associating red with decreased energy contribution and blue with increased stability upon binding (Figure 2B).

Overall, we observe that the energy profile resembles the experimentally

determined dynamic modulation induced by the ligand at specific positions, such as the increased backbone dynamics at res. 111, 90, 120, and the decreased motions at residue 5218.

ACS Paragon Plus Environment

Page 12 of 32

Page 13 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 2. A structural representation of the sm27-bound FGF showing in red the region stabilized in cluster 1 and in blue the region stabilized in cluster 2; B residue-based profile of the representative eigenvectors of the three energy clusters; C red, cluster population in the vicinity of a structural transition (from 5ns before to 5 ns after the transition, see text); blue, average cluster population in the vicinity of a randomly picked set of snapshots, with error given by the standard deviation.

Next, we set out to associate the evolution of the energy states with the analysis of the conformational dynamics of the protein. In parallel with the energy-based clustering, structural cluster decomposition by means of an RMSD based similarity matrix was carried out on the same pool of structures (see Methods). We find that, whereas the holo state populates only one cluster, the apo state shows the emergence of a second conformational cluster after 200ns of MD, characterized by a different arrangement of loop 65-75 (Supplementary Figure S4). This segment is located next to the stabilization core emerging in energy cluster 1 that is destabilized in clusters 2 and 3, residues 95115.

In order to reconnect the structural change with the energetic description, we investigated the features of energy states in the vicinity of a structural change, as highlighted by the RMSD clustering. To this end, we measured the population of the energy clusters in the 5ns preceding and the 5ns following the structural transitions and compared it to the average population. We find that energy cluster 3 is overrepresented, compared to the average population, in the vicinity of the transition (Figure 2C). This is consistent with the markedly reduced presence of cluster 3 in the holo state, where no structural transition occurs. Therefore, the data suggests that energy cluster 3 captures a subset of “activated” structures that facilitate the structural transition of the loop, through a mechanism of destabilization of core segment 95-115, coupled to the transition of the adjacent loop 65-75. (Figure 2B).

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The convergence timeline of the cluster population (Supplementary Figure S5) reveals that the decomposition into energy clusters reaches convergence within 100ns along the MD simulation of the holo system (showed in SI, upper plots). This indicates that the energy equilibration is rather fast, compatible with a local amino acid rearrangement. The apo protein shows a similar convergence profile of the energy decomposition. In contrast, the RMSD-based clustering is characterized by the second cluster emerging after a longer time (200ns). This suggests that the structural transition follows the local stability modulation.

Overall, we can conclude that in FGF the energy clusters are modulated upon ligand binding, and the energetic modulation can be reconnected to structural heterogeneity. Moreover, we observe that structures belonging to the energetic cluster characterized by a destabilized folding nucleus are more frequently found in the vicinity of conformational changes.

Apo vs RA-GEF2 peptide bound PDZ2 domain The energy-based protocol described above was subsequently applied to two 500ns trajectories of a different system, namely the apo PDZ2 and the RA-GEF2 peptide bound PDZ2 domain (see Methods). We found here an optimal decomposition into four dominant energy profiles (Figure 4B), which show only minor differences with one another, concentrated in two distinct hotspots. These hotspots are the region surrounding loop30 and, on the opposite side, the C-terminal side of helix H1 (see Figure 3A). The role of three Arg residues, forming ion pairs with Glu residues, appears essential in modulating the eigenvector profiles. More in detail, in state 2 the loop region around residue 30 appears relatively stabilized, due to electrostatic interactions

ACS Paragon Plus Environment

Page 14 of 32

Page 15 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

between residues Arg31 and a network including His32, Tyr36 and Glu90 (Figure 3A). In eigenvector 3, Arg57 and Glu67 also contribute to peaks and add a stabilizing interaction with the backbone of Arg31. In cluster 4, Arg31 is completely detached and does not participate in the network (see Supplementary Figure S7). At the same time, the region around residue 50 is stabilized by electrostatic interactions between Arg51 and Glu10 in eigenvectors 1 and 4 but not in 2,3 The population of the energy clusters is modulated in the simulations of the apo and holo forms, with strongly increased preference for cluster 4 in the presence of the ligand and of clusters 2 and 3 in the ligand free system (Figure 1). Based on this analysis, the main modulation induced by the ligand emerges by comparing cluster 4 and 3, and entails the modulation of the ion pair Arg51-Glu10 (region highlighted in red in Figure 3A), the stabilization of the pair Arg57 and Glu67 and at the same time a partial displacement of loop 30, visible as detachment of Arg31 from Glu90 (see Figure 3B). Both hotspots have been repeatedly identified in previous studies in PDZ2 as ligand modulated regions

28,29

. The loop 30 hotspot belongs to the binding site and is

expectedly associated to dynamical perturbation in the presence of the ligand, as probed by NMR

30;

the other hotspot belongs to an allosterically responsive region

originally defined in the statistical coupling analysis of Lockless and Ranganathan 31, 32. Kong and Karplus 33 identified the same two hotspots as elements constituting a cluster of correlated energy fluctuations in PDZ2. Moreover, the relevance of electrostatics in determining the energy redistribution in the homolog PDZ3 domain has been proposed as the mechanism underlying its dynamic allosteric behavior 34 35. Correspondingly, our clustering approach identifies Arg51 as one major modulating factor in the internal energy partition.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. A structural representation of the RAF-G2 bound PDZ2 domain showing in red the region stabilized in cluster 1 and 4 and in blue the region stabilized in cluster 2,3, representing the ligand bound and the apo state respectively; B residue-based profile of the representative eigenvectors of the four energy clusters; C red, cluster population in the vicinity of a structural transition (from 5ns before to 5 ns after the transition, see text); blue, average cluster population in the vicinity of a randomly picked set of snapshots, with error given by the standard deviation.

Notably, the RMSD-based clustering delivers two distinct clusters, one mainly populated in the apo state and the other for the holo state. Compared to the energy clusters, the ensembles obtained by the RMSD criterion only differ at the loop regions, specifically loop 10-19, which comprises the highest amplitude change in the structure during dynamics. This structural fluctuation is prevented in the ligand bound state due

ACS Paragon Plus Environment

Page 16 of 32

Page 17 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

to the interaction between Ser17 and the C terminus of the peptide (Supplementary figure S6). Interestingly, loop 10-19 is adjacent to the electrostatic interaction, Arg51Glu10, whose modulation is found to be ligand dependent according to the energy clustering. We then repeated the analysis of the occurrence of energy states in the proximity of a structural transition: we found that here the local distribution does not significantly deviate from the average, suggesting that no specific “activated state” is identified, in contrast to the case of FGF (Figure 3C). We hypothesize that this might be a consequence of the prompt convergence of the simulations (see the convergence timeline plot of RMSD-based clustering, Supplementary Figure S8) indicating no significant conformational transitions along the trajectory.

Correpondingly, the time

dependence of the energy clustering decomposition shows a slower convergence in this case, in comparison to the FGF. Here, in the apo system, the configuration corresponding to cluster 2 is populated in the vicinity of the starting structure but cluster 3 progressively dominates. Overall, like in the previous system, the energy decomposition highlights ligand-based modulation and can integrate structural analysis: in particular, the analysis of energy profiles in the vicinity of structural transitions shows no deviation from the equilibrium distribution.

Apo vs ATP or ADP bound N terminal domain of Hsp90. This example extends the application of the energy based clustering to the differentiation among multiple ligands, in addition to the comparison to the apo state. The Delta8 N-terminal domain of the Heat shock protein Hsp90, whose crystal structure was originally solved in the presence of ATP (see Methods), was simulated in the ATP, in the ADP and in the apo state for 500ns. Here we aim not only at validating the analysis of ligand-induced energy modulation with a third example, but also to test

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

whether the method is able to discriminate between different ligands that are known to trigger different conformational responses 36-38. 39

Figure 4. A structural representation of the ATP bound N terminal domain of Hsp90 showing in red the region stabilized in cluster 1 and in blue the region stabilized in cluster 3, representing the apo and the ATP/ADP bound state respectively; the ATP lid is shown in yellow. B residuebased profile of the representative eigenvectors of the three energy clusters; C red, cluster population in the vicinity of a structural transition (from 5ns before to 5 ns after the transition, see text); blue, average cluster population in the vicinity of a randomly picked set of snapshots, with error given by the standard deviation. This shows the relevance of cluster 3 in the vicinity of a transition in the ATP bound state.

The analysis procedure, carried out like in the previous cases on the three concatenated trajectories, leads us to identifying three energy clusters (Figure 1, 4B). In the first cluster the interaction between three residues, namely the ATP coordinating Glu47, the nearby residue Arg46 and Ser129 belonging to the ATP lid, dominates the profile. Another contribution emerges from the pair Glu42-Arg202 (Figure 4A, red

ACS Paragon Plus Environment

Page 18 of 32

Page 19 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

region). Cluster 2 loses any electrostatic coupling hotspots, as indicated by the absence of high peaks. Here the energy profile is dominated by the central strand of the beta sheet, specifically residues 140 to 160, as well as other portions of the hydrophobic core such as 90 to 100 ( Figure 4A). Finally in cluster 3 the electrostatic interaction between Ser129 and Arg46 is destabilized due to the increased distance between Ser129 and the pair Glu47-Arg46 (from 7 to 9.2 AA on average). In contrast, the main peak of the eigenvector is due to the ion pair Asp191-Lys193, which is associated with a higher structural rigidity of this segment (Figure 4A, blue region) Cluster 1 is the most populated configuration in the APO state (Figure 1), while cluster 2 dominates ADP state (Figure 1). Cluster 3 is the most representative for ATP, but is almost as populated for ADP. Therefore, the analysis over 500ns is able to discriminate between the apo and the ligand bound states. Interestingly, from the functional point of view, the N terminal stabilization region around Glu46 is found to be extremely mutation-sensitive in

40

in the yeast homolog.

The evolutionary conservation and sensitivity to mutations is high around the ATP binding site, which includes the region that is modulated going from cluster 1 to cluster 3. This suggests a functional role of the local modulation induced by the ligand, which is highlighted through the energy clustering. In particular, the destabilization of the interaction network involving the C terminal helix (Figure 4A) may be hypothesized to be connected to the large-scale domain motions regulated by the nucleotide exchange in full length Hsp90. Other experimental data

41

show that the region emerging as

stability peak in cluster 1 contains sites of post-translational modification, such as Tyr 38, which is phosphorylated by Wee1. The RMSD based clustering, carried out on the same dataset, decomposes the concatenated trajectory of apo, ADP and ATP bound Hsp90 into 5 separate clusters, which differ mainly for the N terminal strand and the position of the ATP lid, particularly of its N terminal part. Clusters 1 and 2, representative of the ATP state, show a more extended N terminal end of the ATP lid .coupled to the position of Lys112, interacting with the phosphate groups of the ligand, when present. Cluster 3 and 5, in which the ATP lid is coordinated through the interaction between Ser129 and Glu46-

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Arg47, are mainly populated in the apo state. Cluster 4, populated by ADP, is in in an intermediate position (See Supplementary Figure S9). In order to connect the results of energy-based analyses to the conformational transitions described by the RMSD clustering, as in the previous cases, we applied the analysis of the population of energy states in the vicinity of a structural transition.

In contrast to the apo system, where the distribution of energy states near transitions does not deviate from the average, for the ATP-bound Hsp90 the population of energy state 3 is locally increased in the vicinity of the transition. This suggests that the perturbation induced by ATP differs from the one induced by ADP, for which cluster 1 is dominant, and induce a different “activated” state. Finally, the analysis of the time dependence of the energy cluster population reveals that the apo system reaches a convergence distribution within 100ns, with cluster 1 and cluster 3 populated at equilibrium and cluster 2 reduced to a low population. In contrast, the ATP and ADP simulation are characterized by a similar progressive decrease of cluster 1 population, and a concomitant increase of cluster 3, reaching equilibrium later in the simulation. Interestingly the convergence of RMSD based clusters, which evolve towards different structures in the three ligand states (see Supplementary Figure S10), lags behind energy clustering by about 100ns.

Once more, we observe distinct patterns of organization of local interactions in response to the presence of a certain ligand. Such differential energy patterns characterize ensembles that are over-represented in proximity of structural reorganizations, showing that fold stability changes induced by a specific ligand may favor switching from one state to another.

Discussion and Conclusions

ACS Paragon Plus Environment

Page 20 of 32

Page 21 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

In this work we have explored, in a number of examples, a novel clustering method based on the similarity of the principal energy profiles between two snapshots of a MD trajectory. The analysis of energy profiles, obtainable from the Energy Decomposition Method (EDM), shows how the distribution of energy states can be modulated by the presence of a ligand. The reorganization of local interactions (including those distant from the ligand-recognition site) resulting from the binding event highlights the protein regions that underlie the onset of conformational changes, even in the absence of major macroscopic structural reorganizations35. Indeed we observe in the three considered cases that the dominant energy cluster changes from apo to ligand bound, indicating that the method is able to capture a modulation that occurs even without leaving the conformational basin of the starting structure. Also, we note that, despite the significant contribution of non-polar interactions defining the hydrophobic core of the proteins, the method predominantly highlights the ligand-induced modulation of electrostatic pair interactions, whose presence or absence turns on and off the peaks in the profile. In particular, Arginine residues are often involved in peaks due to their significant contribution both to Coulomb and to Van der Waals terms. This was observed for FGF, and particularly for PDZ and Hsp90, where the interconversion between energy states is strongly affected by different combinations of electrostatic interactions between pairs of charged residues. This highlights the role played by electrostatic interactions as spy-sensor residues, identifying the points of onset of local conformational changes. Importantly, the functional relevance of this energetic modulation and of the regions highlighted by the method is supported by evidence in the literature, as shown in the various examples in the Results section.

We further investigated the relationships between energy clusters and conformational changes, asking whether structural transitions, highlighted by RMSD-based clustering, might be associated to specific (modulations of) energy configurations. To this end, we evaluated the local density of energy states populated in the vicinity of a structural

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 32

transition. This provided evidence that structural changes, detected as transitions to a different RMSD based cluster, can in some cases be associated with a specific energetic signature. Therefore, one might consider defining an ensemble of “activated states” more likely to evolve through structural transitions. The underlying concept is that favorable transitions between conformations may occur through the locally perturbed or locally unfolded states. The lower stability of the starting state could favor sampling of alternative structures, ultimately resulting in a reduction of the barriers to be overcome in undergoing conformational changes. The concept of linking stability and local unfolding or conformational rearrangements has been widely investigated by experimental and computational approaches. In this framework, the lower (higher) relative contribution to the stability of certain structures would favor sampling of alternative states, thereby resulting in conformational transitions42 43 44 45 . Our approach extends the reach of these concepts to the realm of all atom simulations and allows to identify the reassembly mechanism of interaction patterns as well as ensembles of structures that may act as “activated states” to trigger conformational changes.

In the presence of conformational changes that populate structural clusters along the trajectory, the convergence profiles of the energy-based clustering appear generally to reach equilibrium faster. This is in agreement with the hypothesis that the energy modulation mainly involves local rearrangements of pair interactions, which might promptly respond to the perturbation induced by the ligand and anticipate larger scale structural changes. As such, the energy clustering method offers a complementary point of view on the ligand-induced modulation. Recently developed methods combine the use of massive MD sampling on the microsecond scale with a kinetic analysis of conformational transitions based on Markov state models. A critical step in this approach is the dimensionality reduction required to manage the data and to focus on relevant degrees of freedom involved in the transition under study. To this end, a way to highlight large-scale transitions involving groups of atoms can be either PCA, which identifies the highest amplitude collective motions, or TiCA

ACS Paragon Plus Environment

46, 47,

Page 23 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

which seeks collective motions occurring on the longest time scales explored in the simulations. Both approaches focus on physiologically relevant global conformational transitions, either searching for the largest amplitude or for the longest time scale, therefore ignoring local, short time scales in response to a ligand.

Here, we suggest the opportunity of complementing this point of view by focusing on the energetics of local changes, both as a means of identifying residues functionally critical in the conformational response upon ligand binding, and as a tool to select possible “activated” structures prone to conformational changes. From the practical point of view, the energy based analysis here applied to monodomain proteins can be straightforwardly extended to multidomain systems, using the approach that was developed in 48 to evaluate the energy profile. We envision that this information can be proficiently used to define optimal starting structures for enhanced sampling applications, ranging from adaptive and smoothing simulations to the use of simplified energy profiles as appropriate descriptors of collective variables.

In conclusion, in the quest for methods for managing large scale simulation data and high throughput screening efforts as well as for modelling the impact of chemically induced changes in proteins

49,

we have described one of the first semiquantitative

attempts to evaluate the ligand-induced internal energetic reorganization that underlies the activation of functionally oriented conformational changes, through simple, unbiased all-atom MD simulations. Our model and approach have shown promise in detecting the regions and the reassembly of local interaction patterns, determined by different factors (structure, packing, electrostatics), that are relevant for initiating conformational events in different systems. Importantly, our findings are fully consistent with experimental characterizations of the regions that underpin structural deformation.

Acknowledgments

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

We acknowledge funding from Associazione Italiana Ricerca sul Cancro (AIRC) Grants IG 15420 and IG 20019; and from the Ministry of Foreign Affairs through the project PERTNET.

References

1. Bryngelson, J. D.; Onuchic, J. N.; Socci, N. D.; Wolynes, P. G., Funnels, pathways, and the energy landscape of protein folding: a synthesis. Proteins: Struct. Funct. Genet. 1995, 21, 167-195. 2. Micheletti, C.; Lattanzi, G.; Maritan, A., Elastic properties of proteins: Insight on the folding process and evolutionary selection of native structures. J. Mol. Biol. 2002, 321, 909-921. 3. Onuchic, J. N.; Nymers, H.; Gracia, A. E.; Chahine, J.; Socci, N., The energy landscape theory of protein folding: insight into folding mechanisms and scenarios. Adv. Protein Chem. 2000, 53, 87-152. 4. Wolynes, P. G.; Onuchic, J. N.; Thirumalai, D., Navigating the folding routes. Science 1995, 267, 1619-1620. 5. Popovych, N.; Sun, S.; Ebright, R. H.; Kalodimos, C. G., Dynamically driven protein allostery. Nat. Struct. Mol. Biol 2006, 13, 831-838. 6. Tsai, C. J.; del Sol, A.; Nussinov, R., Protein allostery, signal transmission and dynamics: a classification scheme of allosteric mechanisms. Mol. BioSyst. 2008, 5, 207-216. 7. Henzler-Wildman, K.; Kern, D., Dynamic Personalities of proteins. Nature 2007, 450, 964-972. 8. Tobi, D.; Bahar, I., Structural changes involved in protein binding correlate with intrinsic motions of proteins in the unbound state. Proc. Natl. Acad. Sci. USA 2005, 102, 18908. 9. Laio, A.; Parrinello, M., Escaping free-energy minima. Proc. Natl. Acad. Sci. USA 2002, 99, 12562-12566. 10. Bowman, G. R.; Ensign, D. L.; Pande, V. S., Enhanced Modeling via Network Theory: Adaptive Sampling of Markov State Models. J Chem Theory Comput 2010, 6, 787-794. 11. Tiana, G.; Simona, F.; De Mori, G. M. S.; Broglia, R. A.; Colombo, G., Understanding the determinants of stability and folding of small globular proteins from their energetics. Protein Science 2004, 13, 113-124.

ACS Paragon Plus Environment

Page 24 of 32

Page 25 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

12. Morra, G.; Colombo, G., Relationship between energy distribution and fold stability: Insights from molecular dynamics simulations of native and mutant proteins. Proteins: Struct. Funct. and Bioinf. 2008, 72, 660-672. 13. Moy, F. J.; Seddon Ap Fau - Bohlen, P.; Bohlen P Fau - Powers, R.; Powers, R., High-resolution solution structure of basic fibroblast growth factor determined by multidimensional heteronuclear magnetic resonance spectroscopy. Biochemistry 1996, 35, 13552-61. 14. Colombo, G.; Margosio, B.; Ragona, L.; Neves, M.; Bonifacio, S.; Annis, D. S.; Stravalaci, M.; Tomaselli, S.; Giavazzi, R.; Rusnati, M.; Presta, M.; Zetta, L.; Mosher, D. F.; Ribatti, D.; Gobbi, M.; Taraboletti, G., Non-peptidic thrombospondin-1 mimics as fibroblast growth factor-2 inhibitors: an integrated strategy for the development of new antiangiogenic compounds. J. Biol. Chem. 2010, 285, 87338742. 15. Zhang, J.; Sapienza, P. J.; Ke, H.; Chang, A.; Hengel, S. R.; Wang, H.; Phillips, J., George N; Lee, A. L., Crystallographic and Nuclear Magnetic Resonance Evaluation of the Impact of Peptide Binding to the Second PDZ Domain of Protein Tyrosine Phosphatase 1E. Biochemistry 2010, 49, 9280-9291. 16. Li, J.; Sun, L.; Xu, C.; Yu, F.; Zhou, H.; Zhao, Y.; Zhang, J.; Cai, J.; Mao, C.; Tang, L.; Xu, Y.; He, J., Structure insights into mechanisms of ATP hydrolysis and the activation of human heat-shock protein 90. Acta Biochimica et Biophysica Sinica 2012, 44, 300-306. 17. Dominguez, C.; Boelens, R.; Bonvin, A. M. J. J., HADDOCK: a protein-protein docking approach based on biochemical and/or biophysical information. J. Am. Chem. Soc. 2003, 125, 1731-1737. 18. Pagano, K.; Torella, R.; Foglieni, C.; Bugatti, A.; Tomaselli, S.; Zetta, L.; Presta, M.; Rusnati, M.; Taraboletti, G.; Colombo, G.; Ragona, L., Direct and Allosteric Inhibition of the FGF2/HSPGs/FGFR1 Ternary Complex Formation by an Antiangiogenic, Thrombospondin-1-Mimic Small Molecule. PLoS ONE 2012, 7, e36990. 19. Case, D. A.; Darden, T. A.; Cheatham, T. E. I.; Simmerling, C. L.; Wang, J.; Duke, R. E.; Luo, R.; Walker, R. C.; Zhang, W.; Merz, K. M.; Roberts, B.; Hayik, S.; Roitberg, A.; Seabra, G.; Swails, J.; Goetz, A. W.; Kolossváry, I.; Wong, K. F.; Paesani, F.; Vanicek, J.; Wolf, R. M.; Liu, J.; Wu, X.; Brozell, S. R.; Steinbrecher, T.; Gohlke, H.; Cai, Q.; Ye, X.; Wang, J.; Hsieh, M.-J.; Cui, G.; Roe, D. R.; Mathews, D. H.; Seetin, M. G.; Salomon-Ferrer, R.; Sagui, C.; Babin, V.; Luchko, T.; Gusarov, S.; Kovalenko, A.; Kollman, P. A., AMBER 12. University of California, San Francisco. 2012. 20. Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Di Nola, A.; Haak, J. R., Molecular dynamics with coupling to an external bath. J. Chem. Phys. 1984, 81, 3684-3690. 21. Darden, T.; York, D.; Pedersen, L., Particle mesh Ewald: An N-log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98. 22. Gunsteren, W. F. v.; Berendsen, H. J. C., Algorithms for molecular dynamics and constraint dynamics. Mol. Phys. 1977, 34, 1311.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

23. Genoni, A.; Morra, G.; Merz, K. M. M. J.; Colombo, G., Computational Study of the Resistance Shown by the Subtype B/HIV-1 Protease to Currently Known Inhibitors. Biochemistry 2010, 49, 4283-4295. 24. Scarabelli, G.; Morra, G.; Colombo, G., Predicting interaction sited from the energetics of isolated proteins: a new approach to epitope mapping. Biophys. J. 2010, 98, 1966-1975. 25. Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E., GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J Chem Theory Comput 2008, 4, 435-447. 26. Kaufman, L., Rousseeuw, P.J, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley: New York, 1990. 27. Rousseeuw, P. J., Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl. Math. 1987, 20, 53-65. 28. Morra, G.; Genoni, A.; Colombo, G., Mechanisms of Differential Allosteric Modulation in Homologous Proteins: Insights from the Analysis of Internal Dynamics and Energetics of PDZ Domains. J Chem Theory Comput. 2014, 10, 5677-5689 29. Conti Nibali, V.; Morra, G.; Havenith, M.; Colombo, G., Role of Terahertz (THz) Fluctuations in the Allosteric Properties of the PDZ Domains. J. Phys. Chem. B. 2017, 121, 10200-10208. 30. Fuentes, E. J.; Der, C. J.; Lee, A. L., Ligand dependent dynamics of intramolecular signaling in a PDZ domain. J. Mol. Biol. 2004, 335, 1105-1115. 31. Lockless, S. W.; Ranganathan, R., Evolutionarily conserved pathways of energetic connectivity in protein families. Science 1999, 286, 295-299. 32. Fuentes, E. J.; Gilmore, S. A.; Mauldin, R. V.; Lee, A. L., Evaluation of Energetic and Dynamic Coupling Networks in a PDZ Domain Protein. J. Mol. Biol. 2006, 364, 337-351. 33. Kong, Y.; Karplus, M., Signaling pathways of PDZ2 domain: A molecular dynamics interaction correlation analysis. Proteins 2009, 74, 145-154. 34. Kumawat, A.; Chakrabarty, S., Hidden electrostatic basis of dynamic allostery in a PDZ domain. Proc. Natl. Acad. Sci. USA 2017, 114, E5825. 35. Liu, J.; Nussinov, R., Energetic redistribution in allostery to execute protein function. Proc. Natl. Acad. Sci. USA 2017, 114, 7480. 36. Richter, K.; Muschler, P.; Hainzl, O.; Buchner, J., Coordinated ATP hydrolysis by the Hsp90 dimer. J. Biol. Chem 2001 276, 33689-33696 37. Immormino, R. M.; Dollins, D. E.; Shaffer, P. L.; Soldano, K. L.; Walker, M. A.; Gewirth, D. T., Ligand-induced conformational shift in the N-terminal domain of GRP94, an Hsp90 chaperone J. Biol. Chem 2004, 279 46162-46171. 38. Pearl, L. H.; Prodromou, C., Structure and mechanism of the Hsp90 molecular chaperone machinery. Annu. Rev. Biochem. 2006 75, 271-294 39. Colombo, G.; Morra, G.; Meli, M.; Verkhivker, G. M., Understanding ligandbased modulation of the Hsp90 molecular chaperone dynamics at atomic resolution. Proc. Natl. Acad. Sci. USA 2008, 105, 7676-7681.

ACS Paragon Plus Environment

Page 26 of 32

Page 27 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

40. Mishra, P.; Flynn, J. M.; Starr, T. N.; Bolon, D. N. A., Systematic Mutant Analyses Elucidate General and Client-Specific Aspects of Hsp90 Function. CellReports 2016, 1-12. 41. Mollapour, M.; Neckers, L., Post-translational modifications of Hsp90 and their contributions to chaperone regulation. Biochim. Biophys. Acta 2012, 1823, 648-655. 42. Schrank, T. P.; Bolen, D. W.; Hilser, V. J., Rational modulation of conformational fluctuations in adenylate kinase reveals a local unfolding mechanism for allostery and functional adaptation in proteins. Proc. Natl. Acad. Sci. USA 2009, 106, 16984-16989. 43. Miyashita, O.; Onuchic, J. N.; Wolynes, P. G., Nonlinear elasticity, proteinquakes, and the energy landscapes of functional transitions in proteins. Proc. Natl. Acad. Sci. USA 2003, 100, 12570-12575. 44. Wei, G.; Xi, W.; Nussinov, R.; Ma, B., Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the Cell. Chem. Rev. 2016, 116, 6516-6551. 45. Hilser, V. J.; Wrabl, J. O.; Motlagh, H. N., Structural and Energetic Basis of Allostery. Annu. Rev. Biophys. 2012, 41, 585-609. 46. Perez-Hernandez, G.; Noe, F., Hierarchical Time-Lagged Independent Component Analysis: Computing Slow Modes and Reaction Coordinates for Large Molecular Systems. J Chem Theory Comput 2016, 12, 6118-6129. 47. Perez-Hernandez, G.; Paul, F.; Giorgino, T.; De Fabritiis, G.; Noe, F., Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 2013, 139, 015102. 48. Genoni, A.; Morra, G.; Colombo, G., Identification of Domains in Protein Structures from the Analysis of Intramolecular Interactions. J. Phys. Chem. B. 2012, 116, 3331-3343. 49. Melaccio, F.; del Carmen Marin, M.; Valentini, A.; Montisci, F.; Rinaldi, S.; Cherubini, M.; Yang, X.; Kato, Y.; Stenrup, M.; Orozco-Gonzalez, Y.; Ferrè, N.; Luk, H. L.; Kandori, H.; Olivucci, M., Toward Automatic Rhodopsin Modeling as a Tool for High-Throughput Computational Photobiology. J Chem Theory Comput 2016, 12(12), 6020-6034.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC

ACS Paragon Plus Environment

Page 28 of 32

Page 29 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Histograms showing the relative fractional population of the different energy clusters obtained by clustering decomposition of the concatenated trajectories in the three considered systems: top: apo and sm27-bound FGF, middle: apo and RA-GEF2 bound PDZ2 domain, bottom: apo, ATP, and ADP-bound N terminal domain of Hsp90. 215x166mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. A structural representation of the sm27-bound FGF showing in red the region stabilized in cluster 1 and in blue the region stabilized in cluster 2; B residue-based profile of the representative eigenvectors of the three energy clusters; C red, cluster population in the vicinity of a structural transition (from 5ns before to 5 ns after the transition, see text); blue, average cluster population in the vicinity of a randomly picked set of snapshots, with error given by the standard deviation. 336x213mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 30 of 32

Page 31 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

A structural representation of the RAF-G2 bound PDZ2 domain showing in red the region stabilized in cluster 1 and 4 and in blue the region stabilized in cluster 2,3, representing the ligand bound and the apo state respectively; B residue-based profile of the representative eigenvectors of the four energy clusters; C red, cluster population in the vicinity of a structural transition (from 5ns before to 5 ns after the transition, see text); blue, average cluster population in the vicinity of a randomly picked set of snapshots, with error given by the standard deviation. 311x239mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A structural representation of the ATP bound N terminal domain of Hsp90 showing in red the region stabilized in cluster 1 and in blue the region stabilized in cluster 3, representing the apo and the ATP/ADP bound state respectively; the ATP lid is shown in yellow. B residue-based profile of the representative eigenvectors of the three energy clusters; C red, cluster population in the vicinity of a structural transition (from 5ns before to 5 ns after the transition, see text); blue, average cluster population in the vicinity of a randomly picked set of snapshots, with error given by the standard deviation. This shows the relevance of cluster 3 in the vicinity of a transition in the ATP bound state. 318x219mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 32 of 32