Motions of Allosteric and Orthosteric Ligand-Binding Sites in Proteins

Aug 31, 2016 - Allostery is the phenomenon in which a ligand binding at one site affects ... between potential ligand-binding sites and corresponding ...
0 downloads 0 Views 1MB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

Motions of allosteric and orthosteric ligand binding sites in proteins are highly correlated Xiaomin Ma, Hu Meng, and Luhua Lai J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.6b00039 • Publication Date (Web): 31 Aug 2016 Downloaded from http://pubs.acs.org on September 1, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Motions of Allosteric and Orthosteric Ligand-Binding Sites in Proteins are Highly Correlated Xiaomin Ma†, Hu Meng ‡, and Luhua Lai†, ‡,





Center for Quantitative Biology, Peking University, Beijing 100871, China



BNLMS, State Key Laboratory for Structural Chemistry of Unstable and Stable

Species, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China ║

Peking-Tsinghua Center for Life Sciences College of Chemistry and Molecular

Engineering, Peking University, Beijing 100871, China

KEYWORDS: Gaussian network model; correlation; allosteric site prediction; cavity

1 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 39

ABSTRACT: Allostery is the phenomenon in which a ligand binding at one site affects other sites in the same macromolecule. Allostery has important roles in many biological processes. Theoretically, all non-fibrous proteins are potentially allosteric. However, few allosteric proteins have been validated, and the identification of novel allosteric sites remains a challenge. The motion of residues and subunits underlies protein function; therefore, we hypothesized that the motions of allosteric and orthosteric sites are correlated. We utilized a dataset of 24 known allosteric sites from 23 monomer proteins to calculate the correlations between potential ligand-binding sites and corresponding orthosteric sites using a Gaussian network model (GNM). Most of the known allosteric site motions showed high correlations with corresponding orthosteric site motions, whereas other surface cavities did not. These high correlations were robust when using different structural data for the same protein, such as structures for the apo state and the orthosteric effector-binding state, whereas the contributions of different frequency modes to motion correlations depend on the given protein. The high correlations between allosteric and orthosteric site motions were also observed in oligomeric allosteric proteins. We applied motion correlation analysis to predict potential allosteric sites in the 23 monomer proteins, and some of these predictions were in good agreement with published experimental data. We also performed motion correlation analysis to identify a novel allosteric site in 15-lipoxygenase (an enzyme in the arachidonic acid metabolic network) using 2 ACS Paragon Plus Environment

Page 3 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

recently reported activating compounds. Our analysis correctly identified this novel allosteric site along with two other sites that are currently under experimental investigation. Our study demonstrates that the motions of allosteric sites are highly correlated with the motions of orthosteric sites. Our correlation analysis method provides new tools for predicting potential allosteric sites.

3 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 39

 INTRODUCTION

Allostery has key roles in many biological processes, including enzyme catalysis,1 signal transduction,2 and gene regulation.2 Allostery can be defined as action at a distance,3, 4 whereby a perturbation at one site of a macromolecule causes functional changes at another site. Allosteric regulation of protein function can result from effector-binding events (involving small molecules, liquids, DNA/RNA, or proteins),5-7 covalent modifications, such as phosphorylation,8 and photo absorption.9 Allosteric drugs have several advantages compared with traditional orthosteric drugs, including fewer side effects and easier up- or down-regulation of target activity.10 Theoretically, all proteins are potentially allosteric except for fibrous proteins;11 however, allosteric sites have been identified in only a few proteins. The AlloSteric Database v3.0 currently catalogs 1,930 allosteric site-modulator structural complexes.12 Several studies have attempted to identify allosteric binding sites during the past decade. Most of these studies used machine-learning methods for predicting allosteric sites. Demerdash et al. utilized static and dynamic features to compare experimentally characterized allosteric hotspots and non-allosteric hotspots using a support vector machine model;13 however, the study did not investigate whether the predicted allosteric hotspots can be used for small molecule ligand design. Huang et al. utilized a support vector machine algorithm together with FPocket14 (a pocket detection program) to predict allosteric sites, and built the online prediction web server AlloSite.15 4 ACS Paragon Plus Environment

Page 5 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Protein conformational and property changes caused by allosteric effector binding also have been utilized to identify allosteric sites. Qi et al. mapped surface pockets using the CAVITY program, simulated allosteric effector binding by applying an artificial force using a two-state Gō model, and then predicted novel allosteric sites among the surface pockets.16, 17 They successfully identified two novel allosteric sites in Escherichia coli D-3-phosphoglycerate dehydrogenase and used these sites to design novel allosteric modulation compounds.16, 17 Ma et al. discovered that allosteric effector binding induced changes in residue-residue interaction patterns.18 They also developed a computational method for potential allosteric site prediction by using these differences in residue-residue interaction energies in the cavities of interest in two distinct protein states.18 That study utilized a molecular mechanics generalized Born surface area energy decomposition strategy to analyze conformational ensembles of the target protein generated by molecular dynamics (MD) simulations.18 Panjkovich et al. performed normal mode analysis (NMA) and reported that protein flexibility changed significantly in response to allosteric-ligand binding in 70% of all cases.19 The study also identified potential ligand-binding sites using LIGSITEcsc,20 and added a simulated octahedron ligand into the center of each potential ligand-binding site to test for significant changes in protein flexibility. Subsequently, they developed the Protein Allosteric and Regulatory Sites (PARS) web server for allosteric site prediction.21 Greener et al. analyzed protein flexibility changes to approximate the effect of ligand binding on the flexibility of restrictive residues, combined these results with machine learning to predict allosteric sites,22 and 5 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 39

subsequently built the AlloPred web server.22 Goncearenco et al. recently developed the SPACER web server23 by combining Monte Carlo simulations and NMA. First, the protein surface was probed using Monte Carlo simulations to identify potential ligand-binding sites. If the binding leverage24 of a potential ligand-binding site was high, the site has potential to lock specific protein conformations by ligand binding and cause allosteric effects.. McClendon et al. analyzed correlated motions between residues and developed the MutInf method,25 which was utilized to evaluate the conformational ensembles of human interleukin-2 generated by MD simulations. They reported strong correlations between dihedral angle motions lining two small-molecule binding sites. Süel et al. applied statistical coupling analysis (SCA) on three large protein families, including G protein–coupled receptors, the chymotrypsin class of serine proteases, and hemoglobins. They reported that residues in allosteric pathways were strongly coupled.26 Subsequent experimental and computational studies reported similar results.27-30 These analyses included residues in orthosteric and allosteric sites; therefore, it is possible that orthosteric sites were more tightly coupled with allosteric sites than with other sites (Figure 1). This type of coupling was observed in human interleukin-2 using MutInf analysis,25 and in 3-phosphoinositide-dependent protein kinase-1 (PDK1) using a modified elastic network model.31

6 ACS Paragon Plus Environment

Page 7 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 1. Schematic diagram of correlation strength (represented by arrow thickness) between the orthosteric site and other sites. The orthosteric site was presumed to correlated more strongly with the allosteric site than with other functional unknown sites.

In the present study, we hypothesized that the motions of orthosteric and allosteric sites were highly correlated. Then, we utilized this presumed correlation to rapidly predict allosteric sites using a coarse-grained NMA model, which calculated motion correlations between orthosteric and allosteric sites. Our testing dataset contained 24 known allosteric sites in monomeric proteins, and the motions of 23 allosteric sites were highly correlated with the motions of the corresponding orthosteric sites. We then calculated correlations in oligomeric proteins and found 7 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 39

similar results, indicating that the motions of allosteric sites were indeed highly correlated with the motions of orthosteric sites. We utilized this strategy to develop a new allosteric site prediction method, which is fast and easy to use.

 MATERIALS AND METHODS Protein Dataset. Allosteric proteins were collected from the Core-Diversity set of ASBench.32 We built a testing dataset containing only monomeric proteins according to the following three criteria: (i) the protein structure in the Protein Data Bank33 (PDB) contained both orthosteric sites and known allosteric sites; (ii) protein function as a monomer was confirmed by UniProt34 if the PDB structure included the whole protein; if the PDB structure was only part of the whole protein, its biological assembly was one in the PDB file; and (iii) the allosteric effector-binding site determined by CAVITY35 did not substantially overlap the orthosteric site. We selected 23 known monomeric allosteric proteins meeting these criteria, which included 24 known allosteric sites (Table 1).

Table 1. List of Test Proteins and Z-Score Ranking



Rank of  −  in all cavities except OS



 − 

Protein Name

PDB ID_OS

PDB ID_AS

Tyrosine protein phosphatase non-receptor type 1 (PTP1B)

1C85

1T49

3/13

1.0

Glucokinase (HK4)

3FGU

1V4S

1/10

2.6

β-lactamase

1AXB

1PZO

1/7

1.6

Cell division protein kinase 2 (CDK2)

1B38

3PXZ

2/15

1.8

8 ACS Paragon Plus Environment

Page 9 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Hexokinase type I (HK1)

1DGK

1CZA

1/33

3.4

Myosin II heavy chain

2JHR

2JHR

2/24

2.1

Casein kinase 2α (CK2α)

3H30

3H30

1/15

1.9

3K5V

3K5V

1/10

2.5

2QHN

3JVR

3/9

0.8

1GX6

2HAI

2/14

1.6

4F9W

3NEW

2/12

1.2

2JFN

2JFN

2/6

0.6

1K3A

3LW0

2/10

1.4

3ZCW

4BBG

3/11

0.7

3ZCW

4BBG

4/11

0.5

2XRW

3O2M

1/12

2.0

4MQT

4MQT

1/7

2.1

Protein RecA (RecA)

2G88

2G88

2/13

1.8

RTX toxin RtxA

3GCD

3GCD

1/4

1.7

3PYY

3PYY

1/7

2.2

3V7D

3MKS

3/11

0.8

1W0G

1W0F

10/17

−0.4

2IJM

4EBW

1/8

2.2

3LCB

3LCB

1/18

2.5

Tyrosine protein kinase ABL1 (Bcr-Abl) Serine/threonine protein kinase Chk1 (Chk1) NS5B RNA-directed RNA polymerase (HCV NS5B) Mitogen-activated protein kinase 14 (MAP14) Glutamate racemase (MurI) Insulin-like growth factor 1 receptor (IGF-1R) Kinesin-like protein KIF11 (KIF11) (AS1) Kinesin-like protein KIF11 (KIF11) (AS2) Mitogen-activated protein kinase 8 (MAP8) Muscarinic acetylcholine receptor M2

Tyrosine-protein kinase ABL1 (c-Abl) Cell division control protein 4 (Cdc4) Cytochrome P450 3A4 (P450 3A4) Focal adhesion kinase 1 (FAK1) Isocitrate dehydrogenase kinase/phosphatase (AceK)

OS, orthosteric site; AS, allosteric site; PDB ID_OS, structure used to define OS; PDB ID_AS, protein structure in the allosteric effector-bound state.

Gaussian Network Model. The Gaussian network model (GNM) is a minimalist NMA model used to study biological molecules.36 In GNM, each protein residue is 9 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 39

modeled as one identical node, and residue pairs within a given distance r are connected by harmonic springs of equal strength. The GNM used in this study was based on the implementation of Bahar and coworkers.37 Alpha carbon atoms (Cα) were selected as nodes, and r was accepted as 7Å based on the results of statistical analysis.38, 39

The Kirchhoff or connectivity matrix Γ controls the dynamics of a modeled

protein. It is defined as40

Γ = where



−1,   ≠    ≤ r   0,   ≠    > r  − $ Γ ,   =   ,% 

(1)

is the distance between the  th and th Cα atoms, and ∆) is a vector that

represents the displacement of the  th residue from its equilibrium position. The cross correlation between the fluctuations of two residues  and  is given by41, 42

〈∆) ∙ ∆) 〉 = 3

./ 0 34 2Γ 5 (2) 1

where ./ is Boltzmann’s constant; 0 is the absolute temperature; 1 is the force

constant of the harmonic springs (1 is given the value of 1 kcal mol−1 Å−2 in this

study); and 2Γ 34 5 = ∑< 9=>(89 ∙ 89 /;9 ), where 89 is the  th residue of the . th

eigenvector, ;9 is the . th eigenvalue, and ? is the number of the target residues. The normalized version of this correlation is given by 40

@ =

〈∆) ∙ ∆) 〉

2A∆)> BA∆)> B54/>

=

2Γ 34 5 (3) (2Γ34 5 2Γ 34 5 )4/>

where the value of @ is between −1 and 1. The larger the absolute value of @ , the more highly correlated are the two residues. Doruker et al. showed that correlations 10 ACS Paragon Plus Environment

Page 11 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

were similar when determined from the GNM and from MD simulations.43 Correlation Analysis of Orthosteric Sites and Surface Cavities. Proteins contain orthosteric sites and other potential ligand-binding sites. We identified potential ligand-binding sites using CAVITY.35 These sites were designated as cavity_m, which represents the C th site calculated by CAVITY.35 To measure

correlations between the putative cavity_m sites and the orthosteric site, we define the total correlation of these two regions based on the work by Ma et al. as follows:44

0@ DEFG_I =

$

J∈LJMNL JO PQLPJ LJP, R∈LJMNL JO  SJPT_U

@ (4)

Every eigenmode contributes to the total correlation. Therefore, the total correlation of the . th mode is defined as 9 0@ DEFG_I

=

$

J∈LJMNL JO PQLPJ LJP, R∈LJMNL JO  SJPT_U

2Φ34 59 =

2Φ34 59

(2Φ34 59 2Φ34 59 )4/>

89 ∙ 89 ;9

(5)

The total correlation value is either positive or negative. We want to know the magnitude of these two regions; therefore, the absolute value of each mode’s

9 0@ DEFG_I is added to receive the sum of correlations for all modes


Here, the orthosteric site is defined as the residues surrounding the effector, which

binds at the same site as the protein’s endogenous ligand to within 5 Å. Calculated 11 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 39

cavities with greater than 75% overlapping residues with the orthosteric site were excluded. Residues shared with the orthosteric site were removed, and all remaining DYY residues in the calculated cavity were used to calculate 0@ DEFG_I .

The correlations were normalized using the Z-score as follows: DYY \ − ]^_ ` DEFG_I =

DYY − aDYY 0@ DEFG_I (7) b DYY

where a DYY is the average value of cavity correlation with its corresponding

orthosteric site, and b DYY is the standard deviation of cavity correlation.

 RESULTS AND DISCUSSION Known Allosteric Sites Are Highly Correlated with Corresponding Orthosteric Sites in Monomeric Proteins. We first investigated whether the motion of known allosteric sites has high correlation with the motion of orthosteric sites in monomeric proteins. We selected 23 monomeric proteins for analysis (Table 1).32 A complete X-ray structure was available only for CDK2; the other 22 structures have one or more missing fragments. These fragments might be due to experimental deletion to reduce protein flexibility for crystallization or high flexibility in the solved crystal structures. Orthosteric site residues were defined as those surrounding the ligand that bind within 5 Å of the same site as the protein’s endogenous ligand.

We defined a Z-score (Eq. 7) to check whether the orthosteric and allosteric site correlations were higher than those with other unknown functional sites. For all 24 DYY cases, the \ − ]^_ `de values were larger than 0.5, except for that of P450 3A4

(Table 1). A previous study using a modified elastic network model31 and molecular 12 ACS Paragon Plus Environment

Page 13 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

dynamic simulations25 also observed high correlations between the motions of allosteric and orthosteric sites.

DYY value for every eligible cavity detected in We calculated the \ − ]^_ ` DEFG_I

the allosteric effector-bound state with its corresponding orthosteric site. The resulting values were ranked as shown in Table 1 and Supporting Information Table S1. Eleven of the 24 known allosteric sites ranked first in all cavities except for those of orthosteric sites, 7 ranked in second place, and 4 ranked in the third place. Only P450 3A4 did not have known allosteric sites rank at the top. We checked the P450 3A4 crystal structure and found that it was missing 24 N-terminal residues and a 27 residue loop (262−288). It also has a large and extremely malleable active cavity that simultaneously binds multiple ligand molecules of the same type.45,46 Therefore, our definition of orthosteric site residues may not include all active site residues, which may result in low rankings for correlated motions. For the remaining 11 proteins whose known allosteric sites ranked in second or third place, we found that 4 of 15 top-ranking sites contained known functional residues. For example, cavity_7 of PTP1B contained phosphorylation sites at Ser242 and Ser243.47 Cavity_4 of CDK2 also contained phosphorylation sites at Thr14 and Tyr15, which are located in the known allosteric site; mutation of these residues could change the protein activity.48, 49 Cavity_1 of Myosin II heavy chain contained the actin-binding region.50 Cavity_2 of MurI contained the UDP-MurNAc-Ala binding region, which is the known allosteric site.51 For many of the known allosteric proteins, only a few driver residues trigger the 13 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

allosteric effect in the allosteric site, whereas the remaining anchor residues stabilize the effector-binding state.52 Therefore, we calculated the correlation between every non-orthosteric residue (residue i) and each residue (residue j) in the orthosteric site as follows:


34 9

gΦ h

34 9



34 9

(gΦ h gΦ h 



)4/>

ff (8)

The resulting correlations were ranked to determine whether known allosteric sites contain more driver residues than other cavities. We counted the number of residues with high correlation (the 0@DYY value ranking in the top 30%) in cavity_m. The results showed that 10 known allosteric sites contained the maximum number of high correlation residues, 4 ranked in second place, and 2 ranked in third place (Supplementary Information Table S2). This indicates that known allosteric sites tend to contain more driver residues to communicate with the corresponding orthosteric sites. Contributions of Different Frequency Modes to Motion Correlations Depends on Individual Proteins. Low-frequency motion modes often relate to protein function, whereas high-frequency motion modes always result from local motions.53 To determine which frequency modes have greater contributions to the correlations between allosteric and orthosteric sites, we modified the total correlation in Eq. 6 based on the work of Ma et al.44 Dj 0@ DEFG_I

j

9 = $Z0@ DEFG_I Z (9) D

Dj where  ∈ 22, ?5 and l ∈ 22, ?5. We used different cutoffs to calculate 0@ DEFG_I . 14

ACS Paragon Plus Environment

Page 14 of 39

Page 15 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Thus, the Z-Score is given by Dj \ − ]^_ ` DEFG_I =

Dj 0@ DEFG_I − aDj (10) b Dj

We set  = 2 and l = number of modes whose weights 1/;< were more than 5, 10, 20, or 30 times that of the highest-frequency mode in the GNM. We also set

l = ? and  = number of modes whose weights 1/;< were more than 5, 10, 20, or 30 times that of the highest-frequency mode in the GNM. Dj The \ − ]^_ `de rankings of all known allosteric sites in each cutoff category

indicated that the contributions of different frequency modes to motion correlations depend on the individual proteins (Table 2). Although all modes appear to contribute for some proteins, other proteins have dominant low-frequency modes or dominant high-frequency modes. We checked the locations of the seven known allosteric sites [β-lactamase, CDK2, CK2α, HCV NS5B, MAP14, IGF-1R, and KIF11(AS1)] with relatively high-frequency modes predominating in the motion correlations. Six of these (except for MAP14) were in the vicinity of the corresponding orthosteric sites (Supplementary Information Figure S1), suggesting that the relative positions of allosteric and orthosteric sites determine which frequency modes have the greatest contributions. This result is consistent with a previous study, which reported that high-frequency motion modes always result from local motions.53

15 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 39

m Table 2.  −  Ranking for Different Mode Cutoffs

Protein Name

L5

L10

L20

L30

H5

H10

H20

H30

PTP1B

4

1

1

1

4

6

5

6

HK4

1

1

1

1

1

1

1

1

β-lactamase

1

2

5

6

1

1

1

1

CDK2

2

2

5

4

1

1

1

1

HK1

1

1

1

1

2

2

2

1

Myosin II heavy chain

2

2

2

2

2

2

2

2

CK2α

6

7

6

1

1

1

1

1

Bcr-Abl

1

1

1

1

1

1

1

1

Chk1

3

3

2

3

3

3

2

2

HCV NS5B

6

11

4

1

1

1

2

2

MAP14

2

7

5

5

2

1

1

1

MurI

3

4

1

3

3

2

3

2

IGF-1R

3

6

2

10

1

1

2

1

KIF11(AS1)

5

8

7

7

2

2

2

2

KIF11(AS2)

3

4

5

4

3

4

4

4

MAP8

1

2

2

2

1

1

1

1

Muscarinic acetylcholine receptor M2

1

1

1

1

1

1

1

1

RecA

2

2

2

2

2

2

2

2

RTX toxin RtxA

1

1

1

2

1

1

1

1

c-Abl

1

1

1

1

1

1

1

1

Cdc4

5

8

2

1

3

2

3

3 16

ACS Paragon Plus Environment

Page 17 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

P450 3A4

6

12

13

11

15

10

8

8

FAK1

1

1

1

1

1

1

1

1

AceK

1

1

1

1

1

1

1

1

Top 1

10

9

10

11

13

14

12

14

Top 2

4

5

6

4

5

6

7

6

Dj L5, L10, L20, L30: \ − ]^_ ` DEFG_I was calculated using modes from the second

to the mode whose weights 1/;< were more than 5, 10, 20, or 30 times that of the

Dj highest-frequency mode in the GNM. H5, H10, H20, H30: \ − ]^_ ` DEFG_I was

calculated using modes from the mode whose weights 1/;< were less than 5, 10, 20,

or 30 times that of the highest-frequency mode in the GNM. Top 1 and Top 2: number Dj ranked in the first or second place for of known allosteric sites whose \ − ]^_ `de

corresponding mode cutoffs.

Robust Correlations between Orthosteric and Allosteric Sites Using Different Structural Data for the Same Protein. Only six of the 23 proteins used in this study have complex structures with ligands in both the orthosteric and allosteric positions (Myosin II heavy chain, CK2α, Bcr-Abl, MurI, KIF11, RecA, RTX toxin RtxA, and AceK). We investigated whether high correlations between allosteric and orthosteric sites were observed using protein structures in different states. We collected crystal structures of the 23 test proteins in the apo state or the holo state, with ligands binding only at the orthosteric site. Nine proteins had crystal structures in the apo state, and 18

17 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 39

proteins had crystal structures with ligands bound only at the orthosteric site. To DYY value, we used the same orthosteric sites, all eligible calculate the \ − ]^_ ` DEFG_I

DYY cavities, and the same residues used for normalization of 0@ DEFG_I calculated from

the structures with bound allosteric effectors. Nine of the 10 known allosteric sites ranked in first or second place in proteins DYY with only the apo state structure available, and all \ − ]^_ `de values for these sites

were greater than 0.5 (Supplementary Information Tables S4 and S5). For the 18 proteins with complex ligand structures at the orthosteric sites, 16 of the allosteric DYY sites were ranked first or second, and all \ − ]^_ `de values for these sites were

greater than 0.5, except for that of P450 3A4, which had effectors bound only at the orthosteric site. We calculated the Cα root-mean-square differences (RMSD) between structures with bound allosteric effectors and those without (Supplementary Information Table S6). The majority of proteins did not undergo significant structural

changes on allosteric ligand binding (Cα RMSD < 3 Å); only five of these proteins

underwent significant structural changes (RMSD>3 Å). We calculated the volumes of

known allosteric sites in the apo state, and most of them significantly decreased on no allosteric effector binding compared with other cavities in the proteins, with some allosteric sites becoming undetectable (Supplementary Information Table S4). These results indicate that determining structural changes is not a good indicator for allosteric site identification. By contrast, our motion correlation calculations were not sensitive to structural changes, so that structures with and without bound effectors can be used. This confers considerable robustness to our approach, which is particularly 18 ACS Paragon Plus Environment

Page 19 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

useful for predicting new allosteric sites. Allosteric Site Analysis of Oligomeric Proteins. A survey suggests that more than 35% of all proteins in a typical cell are oligomeric.54 Oligomerization is a key factor in regulating protein function, which also minimizes genome size.55 Some proteins have different activities in different oligomeric states.56 Krieger et al. evaluated oligomerization effects on the ionotropic glutamate receptor N-terminal domain. They used an anisotropic network model57 to calculate the conformational changes from lower-order oligomeric states coupled with observed changes from the bioactive tetrameric state. The results showed that proteins in higher-order oligomeric states always generate new intermolecular interfaces, resulting in potential new sites for allosteric regulation.55 The orthosteric sites in each oligomeric protomer often function cooperatively as allosteric activators or inhibitors. For example, the severe acute respiratory syndrome (SARS) 3C-like proteinase (3CLpro) has one substrate binding site for each protomer, but only one of these sites is active in the dimer, whereas the other regulates the correct conformation in the active protomer.56 Protein function regulation can be complex; therefore, when calculating the correlations between potential ligand-binding sites and orthosteric sites for oligomeric proteins, it is advisable to confirm the functional oligomeric states and carefully define the term “orthosteric site” based on the protein biological activities. We used two known allosteric oligomeric proteins to perform motion correlation analysis. The crystal structures used for the calculations and the results are listed in Table 3.

19 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 39

Table 3. List of Test Oligomeric Allosteric Proteins and Z-Score Ranking

Protein Name

transcriptional regulator



 − 

cavity_2

3/12

0.8

cavity_3

2/12

1.0

4/14

0.8

3/14

1.1

PDB

Known

ID_OS

ID*

AS

cAMP-activated global 1ZRC

Severe acute respiratory

cavity_3

syndrome 3C-like

(cavity_8 pro

1UK4

Rank

3QOP

CRP (CAP)

proteinase (SARS 3CL )





− 

PDB

1UK4

as OS) cavity_8 (cavity_3 as OS)

* allosteric effector-bound state

cAMP-Activated Global Transcriptional Regulator CRP. The cAMP-activated global transcriptional regulator CRP (CAP) is a dimeric allosteric protein identified in the Core-Diversity set of ASBench.32 Cyclic AMP allosterically activates the DNA-binding activity of CAP.59, 60 Each protomer has a cyclic AMP binding site, and both protomers bind DNA (PDB ID: 1ZRC).61 Our calculations indicated that the two allosteric effector-binding sites of the protomers ranked in second and third place. The DYY \ − ]^_ `de values for the sites were 1.0 and 0.8 (Table 3).

SARS 3C-Like Proteinase. SARS 3CLpro hydrolyzes the SARS polyprotein and

has a key role in viral maturation, which is considered as an important drug design target against SARS.62, 63 Although SARS 3CLpro functions as a dimer,64 only one protomer is active in the dimer.58 Therefore, each substrate site of SARS 3CLpro is the allosteric site for the other protomer. We detected 15 potential ligand-binding sites in 20 ACS Paragon Plus Environment

Page 21 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

the dimer, including the two substrate binding sites, cavity_3, and cavity_8. We treated cavity_3 as the orthosteric site and cavity_8 as the allosteric site, and vice versa. When cavity_3 was treated as the orthosteric site, its correlation with cavity_8 DYY ranked in third place and the \ − ]^_ `de value was 1.1 (Table 3). When cavity_8

was treated as the orthosteric site, its correlation with cavity_3 ranked in fourth place DYY and the \ − ]^_ `de value was 0.8 (Table 3). These sites showed high correlations

with each other in both cases. Residues N214 and S284-T285-I286 have been reported to modulate SARS 3CLpro catalysis by dynamic allostery.65, 66 These residues are contained in cavity_1. When we treated cavity _3 as the orthosteric site, cavity_1 ranked in first place and DYY the \ − ]^_ ` DEFG_4 value was 2.2. When cavity_8 was treated as the orthosteric

DYY site, cavity_1 also ranked in first place and the \ − ]^_ ` DEFG_4 value was 2.4. In

both cases, cavity_1 showed high correlation with the orthosteric site, indicating that it is a potential allosteric site of SARS 3CLpro. Identifying Novel Allosteric Sites in Test Proteins. Our results indicate that the motions of allosteric sites are highly correlated with those of orthosteric sites. We sought to apply motion correlation to predict novel allosteric sites. Different proteins contain different numbers of allosteric sites, and known allosteric proteins have one to four known allosteric sites.32 We began by assuming that cavities with \ −

DYY ]^_ ` DEFG_I > 0.5 were potential allosteric sites. We utilized this constraint to

predict novel allosteric sites using calculations based on the monomeric protein dataset. This analysis identified a total of 45 potential allosteric binding sites (Figure 2 21 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 39

and Supplementary Information Table S7). In two cases (cavity_4 in CDK2 and cavity_5 in Chk1), mutations in the predicted allosteric sites have been reported to influence the kinase functions (Figure 3).46,

65

Allosteric inhibitors were recently

identified for cavity_4 in CDK2 using virtual screening and experimental testing.68, 69 These two cases demonstrate the power of our prediction method. Other predicted allosteric sites may serve as guides for studying allosteric regulations or discovering novel allosteric inhibitors and activators in these proteins. Potential allosteric sites also may be used for allosteric ligand design; therefore, these predicted sites should have good druggability scores. The CavityDrugScore tool in the CAVITY35 program is an empirical score derived from known ligand-binding structure, which was used for this purpose. Although some of the predicted allosteric sites received low CavityDrugScores, these sites might be flat or have small volumes in the structures used for the calculations. These sites may potentially open up during MD simulations. Therefore, these sites could be treated as hidden allosteric sites.

22 ACS Paragon Plus Environment

Page 23 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2. Structures of the cavities predicted as potential allosteric sites. The cavities 23 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

are shown as surfaces. (A) cavity_2, cavity_7, cavity_9, and cavity_10 in PTP1B; (B) cavity_1 in HK4; (C) cavity_4 in β-lactamase; (D) cavity_4 in CDK2; (E) cavity_1, cavity_3, cavity_6, cavity_8, cavity_12, and cavity_14 in HK1; (F) cavity_1, cavity_6, and cavity_8 in Myosin II heavy chain; (G) cavity_9, cavity_10, cavity_12, and cavity_13 in CK2α; (H) cavity_5 and cavity_7 in Chk1; (I) cavity_4, and cavity_5 in HCV NS5B; (J) cavity_5, cavity_6, and cavity_10 in MAP14; (K) cavity_2 and cavity_7 in MurI; (L) cavity_2 and cavity_3 in IGF-1R; (M) cavity_1 and cavity_12 in KIF11; (N) cavity_7 and cavity_9 in MAP8; (O) cavity_1 in RecA; (P) cavity_3 in c-Abl; (Q) cavity_2 and cavity_7 in Cdc4; (R) cavity_3, cavity_5, cavity_6, and cavity_11 in P450 3A4; (S) cavity_4 and cavity_5 in AceK.

Figure 3. Two predicted allosteric sites are supported by experimental mutagenesis studies. The predicted sites are shown as surfaces, and the residues supported by experimental mutagenesis are shown as labeled sticks. (A) cavity_4 in CDK2, and (B) cavity_5 in Chk1.

Discovery of Novel Allosteric Sites for 15-Lipoxygenase. The activation of 15-lipoxygenase (15-LOX) was suggested as a novel means to modulate the human 24 ACS Paragon Plus Environment

Page 24 of 39

Page 25 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

arachidonic acid (AA) metabolic network and lead to the reduction of inflammation.70 Recently, Meng et al. successfully identified a novel allosteric site in human 15-LOX using the MutInf method combined with CAVITY35 pocket detection, and discovered its allosteric activators by virtual screening and experimental studies.71 To test whether our method can identify this allosteric site, we used the same human 15-LOX structure as that used by Meng et al., and utilized our method to calculate the

DYY \ − ]^_ ` DEFG_I value for every eligible cavity with its orthosteric site. These

values were then ranked. Among the 21 detected cavities, the recently discovered DYY value of 1.1. We also allosteric site ranked in third place with a \ − ]^_ ` DEFG_I

DYY identified two other cavities with \ − ]^_ ` DEFG_I values that were larger than that

of the recently discovered allosteric site. We have produced active compounds using virtual screening and bioassays, and further experiments are in progress. The MutInf method uses MD simulations to sample conformations,71 whereas our method uses only a coarse-grained GNM model to calculate motion correlations for the two sites. Comparison of Our Method with Other Allosteric Site Prediction Methods. We used a GNM-based approach and found that the motions of known allosteric sites and the corresponding orthosteric sites were highly correlated. These correlations do not depend on the conformational or binding states of the protein. Our approach is simpler, faster, and easier to use than allosteric site prediction methods based on MD simulations.18, 25, 72 The two-state Gō model approach uses both the allosteric ligand binding structure and the apo structure,16 whereas our method requires only one structure for the target protein regardless of the ligand-binding state. This makes our 25 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 39

method more generally applicable for predicting novel allosteric sites. Allostery can involve long-distance action; therefore, our method considers the entire protein rather than only the local properties of known allosteric sites, such as used by AlloSite.15 The work of Panjkovich et al.19, 21 is based on an anisotropic network model, whereas the complexity of the algorithm used in our method is smaller than that. In addition, our method does not require the introduction of any artificial objects into the target proteins, and the predictive accuracy is higher. In our test dataset of monomeric

DYY proteins with the criterion that \ − ]^_ ` DEFG_I > 0.5, we correctly predicted 23 of

the 24 known allosteric sites. We also used the PARS web server21 to predict the 24

known allosteric sites, and found that only 13 of them were detected correctly (Supplementary Information Table S8). The work of Goncearencol et al.23 used the binding leverage score, which contained only local information, to predict potential ligand-binding sites. Then, they used leverage coupling to find allosterically coupled sites. By contrast, our method directly considers correlations between potential ligand-binding sites and orthosteric sites. Our strategy requires much less time and computational effort. Previous studies have used coarse-grained models to study allosteric regulation.27, 31, 73-83

However, most of these studies focused on elucidating allosteric pathways

rather than studying the properties of regulatory sites. Several reports have used coarse-grained models to identify potential allosteric hotspots.31, 74, 79, 80 All of these studies focused on individual residues rather than defined pockets (sites), whereas we used defined pockets in the present study. Special measures have to be used to render 26 ACS Paragon Plus Environment

Page 27 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

known allosteric hotspots significantly different from others. For example, spring constants were changed based on the residue type in the elastic network model,31 surrounding environments of specified residues were considered,74 only the three highest-frequency modes were used for analysis,77 or additional thermodynamics analyses were performed.80 Most of these studies only considered a few example proteins.31,

74, 80

Further studies are required to determine whether the predicted

allosteric sites are generally true.

 CONCLUSIONS We used the GNM model to determine that the motions of allosteric and orthosteric sites in proteins are highly correlated. These correlations do not depend on the conformational or binding states of proteins. We applied our motion correlation analysis to predict potential allosteric sites in monomeric and oligomeric proteins. Some of the predicted allosteric sites were supported by previous reports or in-house experimental studies, as in the case of 15-LOX. The corresponding allosteric site prediction

program

CorrSite

v1.0

can

be

downloaded

from

http://mdl.ipc.pku.edu.cn/mdlweb/download.php.

 ASSOCIATED CONTENT Supporting Information

27 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 39

Figure S1. Cases of relatively high-frequency modes playing much more important roles in motion correlations between known allosteric sites and their corresponding orthosteric sites.

Table S1. Ranking of  − 

 SJPT_U Values and Volumes for All Cavities except for Known Allosteric and Orthosteric Sites in the Test Proteins

Table S2. Rank of Allosteric Site Volume in All Cavities except the Orthosteric Site of Test Proteins, Rank of Number of Highly Correlated Residues (Top 30%) with Orthosteric Residues, and Total Rank of the Three Most-Correlated



Residues no

J_p (noJ_p = ∑R∈PQLPJ LJP noJR ) in each Cavity

Table S3. Spearman Correlation Coefficient between  − 

 SJPT_U Value Ranking and Simple Volume Ranking

Table S4. Z-Score and Volume Ranking of Test Proteins using the Apo State Structure

28 ACS Paragon Plus Environment

Page 29 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table S5. Z-Score and Volume Ranking of Test Proteins using only Structures with Bound Orthosteric Effectors

Table S6. Cα Root Mean Square Differences between Different States of Test Proteins

Table S7. Details of Cavities Predicted as Potential Allosteric Sites

Table S8. Flexibility P-Value of Known Allosteric Sites Predicted by PARS

 AUTHOR INFORMATION

Corresponding Author * E-mail: [email protected] (L.L.); Fax: (+86)10-62751725.

Notes The authors declare no competing financial interest.

29 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 39

 ACKNOWLEDGMENTS The authors thank Prof. Jian Zhang, Shanghai Jiaotong University for offering the allosteric protein dataset; Weilin Zhang, Dr. Daqi Yu, and Xingjie Pan for helpful discussions; and Dr. Fangjin Chen for his help with computational resources. This work was supported in part by the Ministry of Science and Technology of China (2015CB910300, 2016YFA0502300, 2012AA020308) and the National Natural Science Foundation of China (21633001).

30 ACS Paragon Plus Environment

Page 31 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

 REFERENCES 1. Tsai, C. J.; Del Sol, A.; Nussinov, R., Protein Allostery, Signal Transmission and Dynamics: a Classification Scheme of Allosteric Mechanisms. Mol. BioSyst. 2009, 5, 207-216. 2. Tsai, C. J.; del Sol, A.; Nussinov, R., Allostery: Absence of a Change in Shape does not Imply that Allostery is not at Play. J. Mol. Biol. 2008, 378, 1-11. 3. Nussinov, R.; Tsai, C. J., Allostery in Disease and in Drug Discovery. Cell 2013, 153, 293-305. 4. Lesne, A.; Foray, N.; Cathala, G.; Forne, T.; Wong, H.; Victor, J. M., Chromatin Fiber Allostery and the Epigenetic Code. J. Phys.: Condens. Matter 2015, 27, 064114. 5. Csermely, P.; Palotai, R.; Nussinov, R., Induced Fit, Conformational Selection and Independent Dynamic Segments: an Extended View of Binding Events. Trends Biochem. Sci. 2010, 35, 539-546. 6. Cui, Q.; Karplus, M., Allostery and Cooperativity Revisited. Protein Sci. 2008, 17, 1295-1307. 7. Pan, Y.; Tsai, C. J.; Ma, B.; Nussinov, R., Mechanisms of Transcription Factor Selectivity. Trends Genet. 2010, 26, 75-83. 8. Macdonald, J. A.; Storey, K. B., Temperature and Phosphate Effects on Allosteric Phenomena of Phosphofructokinase from a Hibernating Ground Squirrel (Spermophilus Lateralis). FEBS J. 2005, 272, 120-128. 9. Strickland, D.; Moffat, K.; Sosnick, T. R., Light-activated DNA Binding in a Designed Allosteric Protein. Proc. Natl. Acad. Sci. USA 2008, 105, 10709-10714. 10. Peracchi, A.; Mozzarelli, A., Exploring and Exploiting Allostery: Models, Evolution, and Drug Targeting. Biochim. Biophys. Acta 2011, 1814, 922-933. 11. Gunasekaran, K.; Ma, B.; Nussinov, R., Is Allostery an Intrinsic Property of all Dynamic Proteins? Proteins: Struct., Funct., Bioinf. 2004, 57, 433-443. 12. Shen, Q.; Wang, G.; Li, S.; Liu, X.; Lu, S.; Chen, Z.; Song, K.; Yan, J.; Geng, L.; Huang, Z.; Huang, W.; Chen, G.; Zhang, J., ASD v3.0: Unraveling Allosteric

31 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 39

Regulation with Structural Mechanisms and Biological Networks. Nucleic Acids Res. 2016, 44, D527-D535. 13. Demerdash, O. N.; Daily, M. D.; Mitchell, J. C., Structure-based Predictive Models for Allosteric Hot Spots. PLoS Comput. Biol. 2009, 5, e1000531. 14. Le Guilloux, V.; Schmidtke, P.; Tuffery, P., Fpocket: an Open Source Platform for Ligand Pocket Detection. BMC Bioinf. 2009, 10, 168. 15. Huang, W.; Lu, S.; Huang, Z.; Liu, X.; Mou, L.; Luo, Y.; Zhao, Y.; Liu, Y.; Chen, Z.; Hou, T.; Zhang, J., Allosite: a Method for Predicting Allosteric Sites. Bioinformatics 2013, 29, 2357-2359. 16. Qi, Y. F.; Wang, Q.; Tang, B.; Lai, L. H., Identifying Allosteric Binding Sites in Proteins with a Two-State G(o)over-bar Model for Novel Allosteric Effector Discovery. J. Chem. Theory Comput. 2012, 8, 2962-2971. 17. Wang, Q.; Qi, Y.; Yin, N.; Lai, L., Discovery of Novel Allosteric Effectors based on the Predicted Allosteric Sites for Escherichia coli D-3-phosphoglycerate Dehydrogenase. PloS One 2014, 9, e94829. 18. Ma, X.; Qi, Y.; Lai, L., Allosteric Sites can be Identified based on the Residue-residue Interaction Energy Difference. Proteins: Struct., Funct., Bioinf. 2015, 83, 1375-1384. 19. Panjkovich, A.; Daura, X., Exploiting Protein Flexibility to Predict the Location of Allosteric Sites. BMC Bioinf. 2012, 13, 273. 20. Huang, B.; Schroeder, M., LIGSITEcsc: Predicting Ligand Binding Sites using the Connolly Surface and Degree of Conservation. BMC Struct. Biol. 2006, 6, 19. 21. Panjkovich, A.; Daura, X., PARS: a Web Server for the Prediction of Protein Allosteric and Regulatory Sites. Bioinformatics 2014, 30, 1314-1315. 22. Greener, J. G.; Sternberg, M. J., AlloPred: Prediction of Allosteric Pockets on Proteins using Normal Mode Perturbation Analysis. BMC Bioinf. 2015, 16, 335. 23. Goncearenco, A.; Mitternacht, S.; Yong, T.; Eisenhaber, B.; Eisenhaber, F.; Berezovsky, I. N., SPACER: Server for Predicting Allosteric Communication and Effects of Regulation. Nucleic Acids Res. 2013, 41, W266-W272.

32 ACS Paragon Plus Environment

Page 33 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

24. Mitternacht, S.; Berezovsky, I. N., Binding Leverage as a Molecular basis for Allosteric Regulation. PLoS Comput. Biol. 2011, 7, e1002148. 25. McClendon, C. L.; Friedland, G.; Mobley, D. L.; Amirkhani, H.; Jacobson, M. P., Quantifying Correlations Between Allosteric Sites in Thermodynamic Ensembles. J. Chem. Theory Comput. 2009, 5, 2486-2502. 26. Suel, G. M.; Lockless, S. W.; Wall, M. A.; Ranganathan, R., Evolutionarily Conserved Networks of Residues Mediate Allosteric Communication in Proteins. Nat. Struct. Biol. 2003, 10, 59-69. 27. Gerek, Z. N.; Ozkan, S. B., Change in Allosteric Network Affects Binding Affinities of PDZ Domains: Analysis through Perturbation Response Scanning. PLoS Comput. Biol. 2011, 7, e1002154. 28. Ota, N.; Agard, D. A., Intramolecular Signaling Pathways Revealed by Modeling Anisotropic Thermal Diffusion. J. Mol. Biol. 2005, 351, 345-354. 29. Bhattacharyya, M.; Ghosh, A.; Hansia, P.; Vishveshwara, S., Allostery and Conformational Free Energy Changes in Human Tryptophanyl-tRNA Synthetase from Essential Dynamics and Structure Networks. Proteins: Struct., Funct., Bioinf. 2010, 78, 506-517. 30. Ribeiro, A. A.; Ortiz, V., Energy Propagation and Network Energetic Coupling in Proteins. J. Phys. Chem. B 2015, 119, 1835-1846. 31. Williams, G., Elastic Network Model of Allosteric Regulation in Protein Kinase PDK1. BMC Struct Biol. 2010, 10, 11. 32. Huang, W.; Wang, G.; Shen, Q.; Liu, X.; Lu, S.; Geng, L.; Huang, Z.; Zhang, J., ASBench: Benchmarking Sets for Allosteric Discovery. Bioinformatics 2015, 31, 2598-2600. 33. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E., The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235-242. 34. UniProt, C., UniProt: a Hub for Protein Information. Nucleic Acids Res. 2015, 43, D204-D212. 35. Yuan, Y. X.; Pei, J. F.; Lai, L. H., Binding Site Detection and Druggability Prediction of Protein Targets for Structure-Based Drug Design. Curr. Pharm. Des. 2013, 19, 2326-2333. 33 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 39

36. Haliloglu, T.; Bahar, I.; Erman, B., Gaussian Dynamics of Folded Proteins. Phys. Rev. Lett. 1997, 79, 3090-3093. 37. Yang, L. W.; Eyal, E.; Chennubhotla, C.; Jee, J.; Gronenborn, A. M.; Bahar, I., Insights into Equilibrium Dynamics of Proteins from Comparison of NMR and X-ray Data with Computational Predictions. Structure 2007, 15, 741-749. 38. Bahar, I.; Jernigan, R. L., Inter-residue Potentials in Globular Proteins and the Dominance of Highly Specific Hydrophilic Interactions at Close Separation. J. Mol. Biol. 1997, 266, 195-214. 39. Miyazawa, S.; Jernigan, R. L., Residue-residue Potentials with a Favorable Contact Pair Term and an Unfavorable High Packing Density Term, for Simulation and Threading. J. Mol. Biol. 1996, 256, 623-644. 40. Bahar, I.; Atilgan, A. R.; Erman, B., Direct Evaluation of Thermal Fluctuations in Proteins using a Single-parameter Harmonic Potential. Folding Des. 1997, 2, 173-181. 41. Flory, P. J., Statistical Thermodynamics of Random Networks. Proc. R. Soc. London, Ser. A 1976, 351, 351-380. 42. Kloczkowski, A.; Mark, J. E.; Erman, B., Chain Dimensions and Fluctuations in Random Elastomeric Networks .1. Phantom Gaussian Networks in the Undeformed State. Macromolecules (Washington, DC, U. S.) 1989, 22, 1423-1432. 43. Doruker, P.; Atilgan, A. R.; Bahar, I., Dynamics of Proteins Predicted by Molecular Dynamics Simulations and Analytical Approaches: Application to alpha-amylase Inhibitor. Proteins: Struct., Funct., Bioinf. 2000, 40, 512-524. 44. Ma, W. Z.; Tang, C.; Lai, L. H., Specificity of Trypsin and Chymotrypsin: Loop-motion-controlled Dynamic Correlation as a Determinant. Biophys. J. 2005, 89, 1183-1193. 45. Bren, U.;, Oostenbrink C., Cytochrome P450 3A4 Inhibition by Ketoconazole: Tackling the Problem of Ligand Cooperativity using Molecular Dynamics Simulations and Free-energy Calculations. J Chem Inf Model. 2012, 52, 1573-1582. 46. Bren, U.;, Fuchs J. E,; Oostenbrink C., Cooperative Binding of Aflatoxin B1 by Cytochrome P450 3A4: a Computational Study. Chem Res Toxicol. 2014, 27, 2136-2147.

34 ACS Paragon Plus Environment

Page 35 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

47. Moeslein, F. M.; Myers, M. P.; Landreth, G. E., The CLK family kinases, CLK1 and CLK2, Phosphorylate and Activate the Tyrosine Phosphatase, PTP-1B. J. Biol. Chem. 1999, 274, 26697-26704. 48. Lolli, G.; Johnson, L. N., Recognition of Cdk2 by Cdk7. Proteins: Struct., Funct., Bioinf. 2007, 67, 1048-1059. 49. Gu, Y.; Rosenblatt, J.; Morgan, D. O., Cell Cycle Regulation of CDK2 Activity by Phosphorylation of Thr160 and Tyr15. EMBO J. 1992, 11, 3995-4005. 50. UniProt. http://www.uniprot.org/uniprot/P08799(accessed April 29, 2015). 51. Lundqvist, T.; Fisher, S. L.; Kern, G.; Folmer, R. H.; Xue, Y.; Newton, D. T.; Keating, T. A.; Alm, R. A.; de Jonge, B. L., Exploitation of Structural and Regulatory Diversity in Glutamate Racemases. Nature 2007, 447, 817-822. 52. Nussinov, R.; Tsai, C. J.; Liu, J., Principles of Allosteric Interactions in Cell Signaling. J. Am. Chem. Soc. 2014, 136, 17692-17701. 53. Bahar, I.; Atilgan, A. R.; Demirel, M. C.; Erman, B., Vibrational Dynamics of Folded Proteins: Significance of Slow and Fast motions in Relation to Function and Stability. Phys. Rev. Lett. 1998, 80, 2733-2736. 54. Jones, S.; Thornton, J. M., Principles of Protein-protein Interactions. Proc. Natl. Acad. Sci. USA 1996, 93, 13-20. 55. Marianayagam, N. J.; Sunde, M.; Matthews, J. M., The Power of Two: Protein Dimerization in Biology. Trends Biochem. Sci. 2004, 29, 618-625. 56. Gabizon, R.; Friedler, A., Allosteric Modulation of Protein Oligomerization: an Emerging Approach to Drug Design. Front. Chem. (Cleveland) 2014, 2, 9. 57. Krieger, J.; Bahar, I.; Greger, I. H., Structure, Dynamics, and Allosteric Potential of Ionotropic Glutamate Receptor N-Terminal Domains. Biophys. J. 2015, 109, 1136-1148. 58. Chen, H.; Wei, P.; Huang, C. K.; Tan, L.; Liu, Y.; Lai, L. H., Only One Protomer is Active in the Dimer of SARS 3C-like Proteinase. J. Biol. Chem. 2006, 281, 13894-13898. 59. Busby, S.; Ebright, R. H., Transcription Activation by Catabolite Activator Protein (CAP). J. Mol. Biol. 1999, 293, 199-213. 35 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

60. Lawson, C. L.; Swigon, D.; Murakami, K. S.; Darst, S. A.; Berman, H. M.; Ebright, R. H., Catabolite Activator Protein: DNA Binding and Transcription Activation. Curr. Opin. Struct. Biol. 2004, 14, 10-20. 61. Napoli, A. A.; Lawson, C. L.; Ebright, R. H.; Berman, H. M., Indirect Readout of DNA Sequence at the Primary-kink Site in the CAP-DNA complex: Recognition of Pyrimidine-purine and Purine-purine Steps. J. Mol. Biol. 2006, 357, 173-183. 62. Anand, K.; Ziebuhr, J.; Wadhwani, P.; Mesters, J. R.; Hilgenfeld, R., Coronavirus Main Proteinase (3CLpro) Structure: Basis for Design of Anti-SARS Drugs. Science 2003, 300, 1763-1777. 63. Yang, H.; Yang, M.; Ding, Y.; Liu, Y.; Lou, Z.; Zhou, Z.; Sun, L.; Mo, L.; Ye, S.; Pang, H.; Gao, G. F.; Anand, K.; Bartlam, M.; Hilgenfeld, R.; Rao, Z., The Crystal Structures of Severe Acute Respiratory Syndrome Virus Main Protease and its Complex with an Inhibitor. Proc. Natl. Acad. Sci. USA 2003, 100, 13190-13195. 64. Fan, K.; Wei, P.; Feng, Q.; Chen, S.; Huang, C.; Ma, L.; Lai, B.; Pei, J.; Liu, Y.; Chen, J.; Lai, L., Biosynthesis, Purification, and Substrate Specificity of Severe Acute Respiratory Syndrome Coronavirus 3C-like Proteinase. J. Biol. Chem. 2004, 279, 1637-1642. 65. Shi, J.; Song, J., The Catalysis of the SARS 3C-like Protease is Under Extensive Regulation by its Extra Domain. FEBS J. 2006, 273, 1035-1045. 66. Lim, L.; Shi, J.; Mu, Y.; Song, J., Dynamically-driven Enhancement of the Catalytic Machinery of the SARS 3C-like protease by the S284-T285-I286/A Mutations on the Extra Domain. PloS One 2014, 9, e101941. 67. Sanchez, Y.; Wong, C.; Thoma, R. S.; Richman, R.; Wu, Z.; Piwnica-Worms, H.; Elledge, S. J., Conservation of the Chk1 Checkpoint Pathway in Mammals: Linkage of DNA Damage to Cdk Regulation through Cdc25. Science 1997, 277, 1497-1501. 68. Chen, H.; Van Duyne, R.; Zhang, N.; Kashanchi, F.; Zeng, C., A Novel Binding Pocket of Cyclin-dependent Kinase 2. Proteins: Struct., Funct., Bioinf. 2009, 74, 122-132. 69. Hu, Y.; Li, S.; Liu, F.; Geng, L.; Shu, X.; Zhang, J., Discovery of Novel Nonpeptide Allosteric Inhibitors Interrupting the Interaction of CDK2/cyclin A3 by Virtual Screening and Bioassays. Bioorg. Med. Chem. Lett. 2015, 25, 4069-4073. 70. Yang, K.; Ma, W.; Liang, H.; Ouyang, Q.; Tang, C.; Lai, L., Dynamic Simulations on the Arachidonic Acid Metabolic Network. PLoS Comput. Biol. 2007, 3, e55. 36 ACS Paragon Plus Environment

Page 36 of 39

Page 37 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

71. Meng, H.; McClendon, C. L.; Dai, Z.; Li, K.; Zhang, X.; He, S.; Shang, E.; Liu, Y.; Lai, L., Discovery of Novel 15-Lipoxygenase Activators To Shift the Human Arachidonic Acid Metabolic Network toward Inflammation Resolution. J. Med. Chem. 2016, 59, 4202-4209. 72. Miao, Y.; Nichols, S. E.; McCammon, J. A., Mapping of Allosteric Druggable Sites in Activation-associated Conformers of the M2 muscarinic receptor. Chem. Biol. Drug Des. 2014, 83, 237-246. 73. del Sol, A.; Fujihashi, H.; Amoros, D.; Nussinov, R., Residues Crucial for Maintaining Short Paths in Network Communication Mediate Signaling in Proteins. Mol. Syst. Biol. 2006, 2, 2006.0019. 74. Balabin, I. A.; Yang, W.; Beratan, D. N., Coarse-grained Modeling of Allosteric Regulation in Protein Receptors. Proc. Natl. Acad. Sci. USA 2009, 106, 14253-14258. 75. Tehver, R.; Chen, J.; Thirumalai, D., Allostery Wiring Diagrams in the Transitions that Drive the GroEL Reaction Cycle. J. Mol. Biol. 2009, 387, 390-406. 76. Yang, Z.; Majek, P.; Bahar, I., Allosteric Transitions of Supramolecular Systems Explored by Network Models: Application to Chaperonin GroEL. PLoS Comput. Biol. 2009, 5, e1000360. 77. Daily, M. D.; Gray, J. J., Allosteric Communication Occurs via Networks of Tertiary and Quaternary Motions in Proteins. PLoS Comput. Biol. 2009, 5, e1000293. 78. Dykeman, E. C.; Twarock, R., All-atom Normal-mode Analysis Reveals an RNA-induced allostery in a Bacteriophage Coat Protein. Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys. 2010, 81, 031908. 79. Ozbek, P.; Soner, S.; Haliloglu, T., Hot Spots in a Network of Functional Sites. PloS One 2013, 8, e74320. 80. Su, J. G.; Qi, L. S.; Li, C. H.; Zhu, Y. Y.; Du, H. J.; Hou, Y. X.; Hao, R.; Wang, J. H., Prediction of Allosteric Sites on Protein Surfaces with an Elastic-network-model-based Thermodynamic Method. Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys. 2014, 90, 022719. 81. Na, H.; Jernigan, R. L.; Song, G., Bridging between NMA and Elastic Network Models: Preserving All-Atom Accuracy in Coarse-Grained Models. PLoS Comput. Biol. 2015, 11, e1004542.

37 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 39

82. Zheng, W.; Brooks, B. R.; Thirumalai, D., Low-frequency Normal Modes that Describe Allosteric Transitions in Biological Nanomachines are Robust to Sequence Variations. Proc. Natl. Acad. Sci. USA 2006, 103, 7664-7669. 83. Zheng, W.; Brooks, B. R.; Thirumalai, D., Allosteric Transitions in the Chaperonin GroEL are Captured by a Dominant Normal Mode that is Most Robust to Sequence Variations. Biophys. J. 2007, 93, 2289-2299.

38 ACS Paragon Plus Environment

Page 39 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table of Contents Graphic

39 ACS Paragon Plus Environment