Elucidating Allosteric Communications in Proteins with Difference

A difference contact network analysis (dCNA) method is developed for delineating allosteric mechanisms in proteins. The new method addresses limitatio...
3 downloads 0 Views 883KB Size
Subscriber access provided by - Access paid by the | UCSB Libraries

Letter

Elucidating Allosteric Communications in Proteins with Difference Contact Network Analysis Xin-Qiu Yao, Mohamed Faizan Momin, and Donald Hamelberg J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00250 • Publication Date (Web): 29 Jun 2018 Downloaded from http://pubs.acs.org on June 30, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Elucidating Allosteric Communications in Proteins with Difference Contact Network Analysis Xin-Qiu Yao*, Mohamed Momin, Donald Hamelberg* Department of Chemistry, Georgia State University, Atlanta, Georgia 30302-3965, USA. *Correspondence to: Dr. Xin-Qiu Yao and Prof. Donald Hamelberg; Department of Chemistry, Georgia State University, 29 Peachtree Center Ave NE, Atlanta, Georgia 30303-2515, USA. Telephone: (404) 413-5564; E-mail: [email protected]; [email protected].

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 25

ABSTRACT

A difference contact network analysis (dCNA) method is developed for delineating allosteric mechanisms in proteins. The new method addresses limitations of conventional network analysis methods and is particularly suitable for allosteric systems undergoing large-amplitude conformational changes during function. Tests show that dCNA works well for proteins of varying sizes and functions. The design of dCNA is general enough to facilitate analyses of diverse dynamic data generated by molecular dynamics, crystallography, or nuclear magnetic resonance.

ACS Paragon Plus Environment

2

Page 3 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Introduction Allosteric regulation is the primary means used by a cell to orchestrate many subcellular processes. The regulation process manifests a remote communication throughout a biological molecule, i.e. ligand binding or other perturbation occurring at one site influences activities at a distal site. Allostery is also an important factor that must be considered in current and future rational drug discovery and protein design.1,2 Besides fundamental thermodynamic models of allostery,3-5 a vast amount of computational methods have been developed to elucidate atomistic mechanisms underlying allosteric regulations. Despite variations in detail, many of these methods employ a network representation and the graph theory to uncover the relationship between protein structure and function.6-18 A central hypothesis for these methods is that allosteric coupling can be explained by a propagation of information through a network of residue-residue interactions. In the network, each node usually represents a residue, and the edge between nodes is determined by either geometric6-15 or energetic16,17 criteria. Among these methods, the dynamical network analysis originally proposed by Luthey-Schulten and colleagues11 and having been applied to distinct biological systems11,19-23 represents a significant advance. A main feature of the dynamical network analysis method is the use of dynamic information, measured by residue-residue crosscorrelation of positional fluctuations derived from molecular dynamics (MD) simulations, to weight network edges. Since the dynamic nature of proteins are the key determinant underlying allosteric effects,24-26 the dynamical network analysis provides a more accurate picture of allosteric communications than protein structure/contact networks. Once a network is constructed, many network properties can be examined, which may be further linked to protein stability, function, and allosteric regulation.18 One important such property is the partition of residues into segments, or communities, which reflects the intrinsic modular structure of the system. Communities provide

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 25

a simplified representation of protein structure and dynamics, facilitating the elucidation of allosteric communication pathways especially for large systems. Most conventional network analysis methods, including the dynamical network analysis, examine either a static structure or an ensemble of conformations generated by a single simulation (representing a specific functional state). A comparison of ensembles under distinct conditions (e.g. apo and holo) is usually made after the network construction/analysis for each individual ensemble, such as the comparison of communities11 or shortest paths7 derived from different simulations. Since a network determines how allosteric signals are transmitted through a molecule, the comparison of network properties represents an inspection on altered allosteric mechanisms upon e.g. mutations or ligand binding. In other words, the allosteric coupling described by each network is assumed to be fully explained by thermal fluctuations around a single native state, an assumption only justified for allosteric transitions without conformational changes (or alterations of the network topology).5 Moreover, in the conventional community analysis, both the community partition and links between communities (determined by the sum of betweenness of intercommunity edges in the dynamical network analysis) vary because of slight changes of the network topology across conformational ensembles. Although the variation of community partition may reflect the underlying alteration of allosteric mechanisms, it impedes the comparison of community-level dynamical changes between multiple ensembles such as simulations for distinct functional states, wild type and mutants, homologs, or different ligand/effector binding conditions, where the definition of communities must be the same across ensembles. Figure S1 shows an example of community analysis using the dynamical network analysis, where the comparison between ensembles is difficult because of inconsistent partitions of communities.

ACS Paragon Plus Environment

4

Page 5 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Here, we propose a new network analysis method, named difference contact network analysis (dCNA), which takes two or more simulation generated conformational ensembles as input simultaneously and produces a consistent community partition across ensembles enabling a more effective evaluation of intercommunity changes. Because of the inherent ability to process multiple ensembles, our method is suitable for systems undergoing significant conformational changes during allostery. A similar idea has been implemented previously exploiting dynamic information from crystallographic data.27-30 In particular, a quaternary (community-like) network was integrated with tertiary (residue-residue contact) motions to acquire a global communication network underlying allosteric regulations.30 Our method does a similar multiscale information integration but in a simpler manner. In addition, our method processes more than just active and inactive conformations, and can analyze different types of dynamic data, including MD simulations, crystallographic structures, and NMR chemical shift correlation matrices.31-33 We test our method on three biological systems with varying sizes and functions and the method is general enough to be applied to other systems.

Results and Discussion Difference contact network analysis We propose a difference contact network analysis method to dissect conformational dynamics conveyed in residue-residue contacts (Figure 1). The “difference” here means a comparison of two or more simulation generated conformational ensembles. The method takes at least two conformational ensembles as input. The probability of occurrence of residue-residue contacts during each simulation is calculated first (Figure 1A). Two residues are considered in contact at a

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 25

simulation frame if their minimal nonhydrogen atomic distance is at most 4.5 Å. Then, a consensus nonweighted contact network is constructed, where each edge of the network represents the existence of a stable (covalent or noncovalent) contact whose contact probability is at least 0.9 across all input simulations (Figure 1B). The Girvan-Newman algorithm34 is applied to detect residue communities implied in the consensus network – within a community residues are densely connected whereas between communities residues are loosely connected (Figure 1C). The number of communities is determined by an examination of network modularity in a similar manner as previously described.23 Briefly, the most coarse-grained partition with modularity value close to the maximal modularity is chosen; this is to avoid generating too many small communities with little modularity improvement. Despite the aforementioned guideline, the number of communities can be arbitrary and is up to users to adjust – In our view, increasing or reducing the number of communities is equivalent to zoom in/out of the underlying structure, which merely provides different levels of detail but does not change the inherent hierarchical modular organization (Figure S2). On the other hand, from residue-residue contacts of individual ensembles, one can calculate a residue wise difference contact network, in which each edge is weighted by the contact probability change from one network to the other (Figure 1D). At this step, only nonlocal residue pairs that are separated by at least three residues in sequence (i.e., i to i+n, n≥3) are considered. At last, a community-community difference contact network is constructed, where nodes represent communities and edges are weighted by net contact probability changes between communities, i.e. a sum of weights of all edges linking two communities in the residue wise difference network (Figure 1E). The dCNA method provides a coarse-grained description of protein structure that is consistent across ensembles.

ACS Paragon Plus Environment

6

Page 7 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Although we suggest values for the three model parameters, i.e. the distance cutoff to define a contact (dc=4.5 Å), the minimal formation probability for a stable contact (pc=0.9), and the minimal amino acid separation in sequence for a nonlocal residue pair (nc=3), these values are not unique and might need to be adjusted for specific cases. In fact, various values of these parameters have been used in literatures. For example, dc varied in a range of 4-5 Å in previous studies,611,14,29,30

which represents the typical interaction range of most amino acids. A dc=5 Å for the

distance between the center of mass of side chains was also proposed.15 In addition, a doublecutoff scheme was adopted to define a noncovalent contact, in which the Cα-Cα distance of contacting residues must be in a range of 4-8 Å.12,13 For nc, although many methods exclude covalent contacts (nc=2),6,7,10-13 a study proposed that covalent links are important for allosteric communications in proteins.17 In dCNA, covalent contacts are included in the construction of the consensus network and subsequent community detection, but are excluded automatically in the difference network because covalent contacts have no change at all across ensembles. Moreover, drastically different pc values were employed: pc=0.75 in the dynamical network analysis,11 whereas pc=0.2 in a recent work benchmarking protein structure networks.15 Note that although these model parameters are free to change, users should understand what they expect after the changes. For example, increasing dc or reducing pc will result in a more highly connected consensus network, leading to an increase of computational time for community detection. The increase (decrease) of nc, on the other hand, will not affect computational cost but will weaken (enhance) the weight of local contact changes in results. In the following, we test dCNA with the suggested parameters on three distinct biological systems, i.e. the transcription factor NF-κB, the peptidyl-prolyl cis-trans isomerase (PPIase) Pin1, and the PPIase cyclophilins.

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 25

Pirin modulated dynamical changes in the NF-κB/DNA complex The dCNA method identifies intrinsic modular structures of the Pirin/NF-κB/DNA supramolecular complex and reveals dynamical changes between modules under distinct oxidation states of Pirinbinding irons. The nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) is a transcription factor participating in several essential physiological and pathophysiological processes including immune and inflammatory responses.35 The activity of NF-κB is regulated by Pirin, a nonheme iron (Fe) binding nuclear protein overexpressed in response to oxidative stresses.36 Our recent MD studies revealed a redox specific allosteric mechanism in the Pirin/NFκB/DNA complex.37 Specifically, only ferric or Fe(III), not ferrous/Fe(II), can activate Pirin, which in turn binds to NF-κB and modulates NF-κB/DNA interactions, DNA conformational dynamics, and the subsequent transcription. To elucidate dynamical changes underlying NF-κB activation, we applied dCNA to the simulations under Fe(II)- and Fe(III)-bound conditions previously described.37 Seven communities are detected that nearly symmetrically distribute in the homodimeric system (Figure 1C). In particular, DNA is split into two approximate halves, each of which forms a community together with one of N-terminal subdomains of the NF-κB dimer (Figure 1C; red and blue). Major dynamical changes during activation are the detachment (contact breakages) of Pirins from C-terminal communities of NF-κB (Figure 1C; dark grey) and the attachment (contact formations) of Pirins to N-terminal communities of NF-κB (Figure 1E). In addition, a slight increase of contacts is observed between N-terminal communities of the dimer. These changes may underlie the enhanced binding affinity between NF-κB and DNA and the altered conformational dynamics of DNA.37 Note that these dynamical changes are not apparent in the residue wise contact analysis, where contact formations and breakages distribute all over the molecule (Figure 1D).

ACS Paragon Plus Environment

8

Page 9 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The dCNA based community analysis also reveals dynamical changes upon Pirin binding. A Pirinfree NF-κB/DNA complex simulation was added to aforementioned simulations and intra-Pirin changes were ignored. Communities were first detected over the three conformational ensembles and contact changes were then analyzed between the Pirin-free and each of the two Pirin-bound ensembles. It shows that C-terminal subdomains of the NF-κB dimer are split into two communities, one for each monomer (Figure S3; white and dark grey). Also, DNA is classified as an independent community (Figure S3; green). These indicate loose bindings between DNA and NF-κB as well as between NF-κB monomers in the absence of Pirin. Upon binding of Pirins, either Fe(II)- or Fe(III)-bound, overall changes in the system are contact formations (Figure S3). In particular, a lot more contacts are formed between C-terminal communities as well as between NF-κB and DNA, indicating enhanced interactions between these regions. Overall changes are similar between processes of binding Fe(II)- and Fe(III)-bound Pirins except for differences in the region between Pirins and NF-κB (Figure S3). Sequence dependent allostery in Pin1 The dCNA method delineates the mechanism underlying sequence dependent allosteric regulations in human Pin1. Pin1 is a phosphorylation dependent PPIase that is upregulated in several cancers and involved in the Alzheimer’s disease.38,39 Our results suggest that substrate binding to the catalytic domain is modulated by the substrate bound in the noncatalytic WW domain in a sequence dependent manner.40 To further understand the underlying mechanism, we applied dCNA to Pin1. MD simulations under apo-WW, WW bound with a positive allosteric modulator (PAM; FFpSPR), and WW bound with a neutral allosteric modulator (NAM; AVVRpTPPKSP) were performed in previous work.40 Three communities are identified, one for the WW and two for the catalytic domain (Figure 2A). Intriguingly, upon binding PAM to WW

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 25

overall net changes are contact formations between all communities; especially, contacts between catalytic communities are drastically enhanced (Figure 2C & D). In contrast, upon binding NAM to WW, net changes between the WW community (Figure 2A; blue) and the catalytic community containing the substrate binding site (Figure 2A; grey) are contact breakages, whereas changes between WW and the other catalytic community near the “hinge” region (Figure 2A; red) are predominantly contact formations (Figure 2E & F). In addition, contacts between catalytic communities remain largely unchanged in the NAM binding case. These results collectively suggest a hinge-like domain motion upon binding different substrates in the WW domain (Figure 2B), which is coupled to the dynamics in the catalytic domain, similar to the findings of a recent simulation study on Pin1.41 Conservation of dynamics across cyclophilin isoforms The dCNA method dissects contact dynamical changes in human cyclophilins and identifies both conserved and variable changes across distinct isoforms. Cyclophilin is a PPIase responsible for regulating several essential physiological processes and overexpressed in various types of cancer.42 Previously, we examined the conservation of dynamics across three cyclophilin isoforms, i.e. cyclophilin A (denoted by CypA), D (CypD), and E (CypE), during distinct enzymatic processes and identified key residues determining common and isoform-specific dynamical changes.43 Here, we test dCNA using these simulation data. Specifically, nine conformational ensembles generated by substrate-free, cis-substrate bound, and ts (transition state)-substrate bound simulations for each isoform were considered. Nine communities are identified. Intriguingly, the binding pocket is partitioned into one large and two small communities (Figure 3; light gray, yellow, and tan). Also, two flexible loops are identified as independent communities (Figure 3; green and white). It shows that during substrate binding the pattern of dynamical changes between CypA and CypE are very

ACS Paragon Plus Environment

10

Page 11 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

similar, both of which are different from those in CypD (Figure 3; free to cis). In contrast, changes are much more conserved across all isoforms during catalysis (Figure 3; cis to ts). A simple way to quantitatively measure the conservation is to count the number of edges having same blue or red colors across isoforms, which gives 0 and 5 for substrate binding and catalysis, respectively. The results are consistent with our previous study concluding that the functional catalytic process is more dynamically conserved than substrate binding.43 The community analysis not only reproduces the same conclusion but provides an easy way to interpret underlying changes. The dCNA based community analysis facilitates the comparison of conformational dynamics between cyclophilin B (CypB) and C (CypC), a slightly remote branch of isoforms in the phylogenetic tree from CypA/D/E (See Figure S1 in the reference43). A very similar community partition is observed except that the two flexible loops that form two separate communities in CypA/D/E merge into one community in CypB/C (Figure S4; green). Because of this difference, the signature dynamical differences between CypA/D/E involving these loops are absent between CypB/C (Figure S4). It shows that although the overall pattern of dynamical changes is different between the two groups of isoforms, they both reveal that catalysis is more dynamically conserved than substrate binding (in the case of CypB/C, the number of edges having same colors between isoforms are 4 and 8 for substrate binding and catalysis, respectively). Mapping residue wise determinants of allosteric communications with dCNA The dCNA method can easily be adapted to examine network metrics other than communities. Popular network metrics related to identifying key residues potentially mediating allosteric communications include suboptimal path,23,44 residue centrality,23,45 and Guimerà-Amaral cartography.46 Certain number of suboptimal paths (measured by path lengths) connecting two

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 25

end residues, termed source and sink, can be identified to map the most probable pathways allosteric signals traverse through a molecule. Residues can be ranked by the fraction of paths going through corresponding nodes, and top ranked residues are predicted to be key residues for the allosteric coupling. Residues can also be ranked by betweenness centrality, which is a shortest path based metric and evaluates the capability of a residue to mediate global information transmission. In contrast, eigenvector centrality can be used to measure the local density of connections. Guimerà-Amaral cartography can be carried out once a partition of communities is obtained, which allows discriminating between intracommunity and intercommunity residues. The analysis involves the calculation of two parameters, P (participation coefficient) and z (intraconnectivity z-score). Nodes with high P and low z tend to be nonhub connectors of communities (and thus crucial for allosteric communications). The P-z analysis has been implemented to identify residues of distinct roles in the allosteric regulation and to discriminate between allosteric and nonallosteric changes.12,13,47-49 Although all above calculations were previously performed in static contact networks, the extension of them to difference contact networks is straightforward – the major modification is to weight edges by either absolute contact changes (representing an importance network) or their reciprocal (distance network).

Conclusions We propose a new method for network community analysis of protein conformational dynamics. The method, difference contact network analysis or dCNA, produces a consistent partition of communities among simulation systems, thus resolving the difficulty in the comparison of intercommunity changes due to varying community boundaries between simulations in previous

ACS Paragon Plus Environment

12

Page 13 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

network analysis methods. The dCNA method also captures dynamic information about transitions between simulation generated conformational ensembles that is absent in previous methods. This makes dCNA more suitable for allosteric systems that involve large-amplitude conformational changes during allosteric transitions. Tests on NF-κB, Pin1, and cyclophilins show that dCNA is a promising tool to elucidate allosteric communications in proteins. The method can be easily adapted to analyze crystallographic structures and NMR chemical shift correlation matrices, and be extended to examine network metrics other than communities.

Computational Methods Previously described simulations for NF-κB (1.3 µs per trajectory),37 Pin1 (1.2~2.3 µs),40 and cyclophilins (2.1~2.7 µs)43 were analyzed. In addition, microsecond-long MD simulations were performed with AMBER1450 for CypB (2.4~2.5 µs) and CypC (2.4~2.9 µs) under the substratefree, cis-bound, and ts-bound state, with initial conformations for cyclophilin being from PDB51 (3ICH and 2ESL, respectively). Processes to model initial structures of cyclophilin/substrate complexes and all simulation parameters are the same as previously described.43 Dynamical network analysis was performed as previously described.11 Dynamic cross-correlation matrices were calculated with the CPPTRAJ program of AMBER.52 All other analyses were performed with the Bio3D R package.53,54 Protein sequences were aligned with MUSCLE55 via the utilities in Bio3D to find equivalent residues across cyclophilin isoforms. Molecular graphics were generated with VMD 1.8.56 All other figures were made with Bio3D and ggplot2.57

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 25

Figure 1. Workflow of difference contact network analysis. (A) Calculating the residue-residue contact network for each simulation independently. (B) Constructing a consensus contact network, in which each edge represents a stable contact (with probability of occurrence f≥0.9) across all input simulation derived conformational ensembles. (C) Detecting communities (colored regions) based on the consensus network. (D) Calculating a residue-residue difference contact network by subtracting contact probabilities of one network from their counterparts of the other network. Red

ACS Paragon Plus Environment

14

Page 15 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(blue), negative (positive) contact probability difference indicating more contact breakages (formations) from network 1 to network 2. (E) Calculating a community-community difference contact network by mapping the community partition obtained in C to the difference contact network obtained in D. Communities are represented by colored vertices (as in C), with radius of vertex proportional to the number of residues in the corresponding community. The line linking vertices describes the net contact probability change (df) between the communities from network 1 to network 2. Blue and red lines indicate positive and negative changes (with df labeled), respectively, and the line width is determined by |df|.

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 25

Figure 2. Mechanisms of sequence dependent allosteric regulations in Pin1 revealed with dCNA. (A) Three residue communities are identified. The flexible linker region connecting two domains is excluded from analyses for clarity. (B) Superimposition of structures under apo (white), positive allosteric substrate modulator (denoted by PAM)-bound (cyan), and neutral allosteric modulator (NAM)-bound (orange) conditions derived from MD simulations. (C,E) Residue-residue difference contact networks calculated by comparing the apo-state contact network to the PAMand NAM-bound networks, respectively. Red (blue) bars indicate more contact breakages (formations) from the apo to a substrate-bound network. (D,F) Corresponding communitycommunity difference contact networks. Communities are represented by colored vertices (as in A), with radius of vertex proportional to the number of residues in the community. The line linking vertices describes the net contact probability change (df) between the communities. Blue and red

ACS Paragon Plus Environment

16

Page 17 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

lines indicate positive and negative changes (with df labeled), respectively, and the line width is determined by |df|.

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 25

Figure 3. Conserved dynamics across cyclophilin isoforms revealed with dCNA. The substrate binding (denoted by “freecis”) and catalytic (“cists”) process for CypA, CypD, and CypE, totaling nine simulations are examined. Nine communities are identified. Both residue- and community-level difference contact networks are shown for each isoform and each enzymatic process. In the representation of both levels, blue and red lines (bars) indicate contact formations

ACS Paragon Plus Environment

18

Page 19 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

and breakages, respectively, upon substrate binding or catalytic turnover. In community wise networks, communities are represented by colored vertices (as in Top), with radius of vertex proportional to the number of residues in the community. The line linking vertices describes the net contact probability change (df) between the communities, with line width determined by |df|. Lines with |df|≥0.1 are labeled, whereas lines with |df|