Systems Approaches to Understanding and Designing Allosteric

Dec 13, 2017 - Design of allosteric proteins with new function is essential for engineering biological systems. Previous design efforts demonstrate th...
0 downloads 12 Views 544KB Size
Subscriber access provided by READING UNIV

Perspective

Systems Approaches to Understanding and Designing Allosteric Proteins Srivatsan Raman Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.7b01094 • Publication Date (Web): 13 Dec 2017 Downloaded from http://pubs.acs.org on December 14, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Biochemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Systems Approaches to Understanding and Designing Allosteric Proteins

Srivatsan Raman1,2

1 2

Department of Biochemistry, University of Wisconsin-Madison, Madison, WI Department of Bacteriology, University of Wisconsin-Madison, Madison, WI

Email: [email protected]

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT: The study of allostery has a central place in biology due to the myriad roles of allosteric proteins in cellular function. As technologies to probe spatiotemporal resolution of biomolecules have become increasingly sophisticated, so has our understanding of the diverse structural and molecular mechanisms of allosteric proteins. Studies have shown that allosteric signal is transmitted a through a network of residue-residue interactions connecting distal sites on a protein. Linking structural and dynamical changes to the functional role of individual residues will give a more complete molecular view of allostery. In this article, we highlight new mutational technologies that enable a systems-level, quantitative description of allostery that dissect the role of individual residues through large-scale functional screens. A molecular model to predict allosteric hotspots can be developed by applying statistical tools on the resulting large sequence-structure-function datasets. Design of allosteric proteins with new function is essential for engineering biological systems. Previous design efforts demonstrate that the allosteric network is a powerful functional constraint in the design of novel or enhanced allosteric proteins. We discuss how a priori knowledge of allosteric network could improve rational design by facilitating better navigation of design space. Understanding the molecular ‘rules’ governing allostery would elucidate molecular basis of dysfunction in disease-associated allosteric proteins, provide a means to design tailored therapeutics, and enable the design of new sensors and enzymes for synthetic biology.

ACS Paragon Plus Environment

Page 2 of 15

Page 3 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

INTRODUCTION: Allostery is a property of proteins where perturbation at one site of a protein causes a functional effect at a distant site. Study of allostery has a central place in biology due to diverse cellular roles of allosteric proteins that includes signaling, catalysis, gene regulation and transport. Designing allosteric proteins is of high interest in biological engineering to create better enzymes, rewire signaling pathways, build synthetic gene circuits and sense environmental signals. Understanding the molecular mechanism of allostery is not only a fundamental question in structural biology, but also enables rational design of allosteric proteins with new function. In first half of this article, we describe emerging experimental and computational tools to understand, quantify, and predict the molecular drivers of allostery. Protein science is entering a transformative phase fueled by high-throughput mutational methods to comprehensively map the functional landscape at the resolution of individual residues. High-throughput functional screening can help determine, in an unbiased manner, interactions between residues that mediate allosteric signaling. This approach also generates large experimental datasets well-suited to predict residues important for allostery using machine learning. In the latter half, we describe strategies for design of allostery. We review previous design efforts by a classification based on the scale of design perturbation. We suggest that a better understanding of the role of individual residues in effector or substrate recognition, allosteric signaling and stability could lead to greater success in rational design of allosteric proteins through effective navigation of design space. UNDERSTANDING THE MOLECULAR BASIS OF ALLOSTERY When an allosteric protein is perturbed by binding to an effector at a site called the allosteric site, it generally triggers a conformational change which then regulates a distally-located active site. Perturbations can be caused by non-covalent binding to an effector that can be other proteins, small molecules, and nucleic acids, or covalent changes such as mutations and posttranslational modifications1. Allostery is thought to have evolved from weakly interacting protein pairs co-localizing on DNA, plasma membrane, within microcompartments or through other biological processes that increase the local concentration of both proteins. Mutations may subsequently increase their affinity and lead to the evolution of allosteric coupling between sites on both proteins2. Classically, allostery is defined by the presence of a conformational change between the inactive and active protein states upon effector binding. The MWC model proposed a mechanism of allostery wherein this conformational change was an all-or-none event3. In contrast, an alternative mechanism was proposed in the KNF model in which allosteric transition occurred via sequential structural changes leading from the inactive to active state4. These reductionist models, while simple and elegant, did not capture the conformational heterogeneity of proteins in both states. The ensemble view of allostery was subsequently developed, positing that allosteric proteins exist in an ensemble of states, and that the binding of the effector shifts the equilibrium toward an active state5,6. However, these models of allostery only capture the broad transition between protein states and fail to explain the effect of allostery at a molecular level. Allostery is a remarkable property that allows two sites within a protein to communicate with each other despite being separated by large spatial distances. Understanding the molecular basis for allosteric communication has been a longstanding challenge in the field. Several studies have

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

shown that a network of residue-residue interactions propagates allosteric signal from the effector-binding site to the active site7–9. These findings pose many interesting questions: what properties make a residue important for allostery? What are the energetics of residue-residue interactions that drive allostery? How do mutations disrupt allostery? Can we predict the allosteric pathway in a protein? Answers to these questions will provide a deep insight into how proteins function under normal physiology and in diseased states, and facilitate rational design of potent and selective allosteric drugs. In this section, we will briefly review advances in computational and experimental tools to probe the network of residues linking the effector-binding and active sites. We highlight emerging high-throughput technologies that enable a systems-level, quantitative description of allostery through large-scale functional screens. Machine learning on such large experimental datasets offers an opportunity to identify underlying molecular rules of allostery which could lead to a model to predict allosteric residues and their connectivity. These tools would help us understand the molecular basis of allostery and may be applied in biomedicine toward the discovery of allosteric sites for drug targeting, developing better molecular models of disease, and understanding functional effects of mutations in disease-associated allosteric proteins10. Computational and Experimental Tools Two commonly used complementary approaches to study allosteric pathways at high resolution are molecular dynamics (MD) simulation and statistical coupling analysis (SCA). If structures of the active and inactive states are available, MD simulations can predict allosteric transition between the two states by computing physical forces between atoms. MD simulations provide a frame-by-frame atomic trajectory and enable facile probing of different effectors, the effects of changes in pH, and the effects of mutations. However, the main limitation of MD is the difficulty in simulating in the micro-to-millisecond timeframe where motion relevant to allostery occurs. To address this, MD simulations accelerate allosteric motion by several orders of magnitude to shorten simulation timescales11. But these results must be experimentally validated to link in silico changes to a physiologically observed allosteric trajectory. Statistical coupling analysis uses a complementary approach to decipher allosteric pathways. It is based on the premise that evolutionarily co-varying residues that are not adjacent in threedimensional structure are likely to be allosterically-coupled pairs. Ranganathan and coworkers demonstrated that in various proteins such co-varying pairs computed from multiple sequence alignments formed contiguous networks of residues linking distal sites12,13. The implicit assumption in SCA is that the inferred allosteric pathway is common to all proteins within that family. However, allosteric pathways may differ within a family. Furthermore, studies have shown evidence for multiple allosteric pathways within a protein. Activation of the glutamine aminotransferase imidazole glycerol phosphate synthase (IGPS) by one of its multiple allosteric effectors induces widespread dynamic changes in the protein14. Allosteric “hotspots”, or residues involved in allostery, identified in IGPS are ligand-specific, suggesting the presence of multiple native allosteric networks in this one protein. In addition to MD and SCA, other computational approaches to study allostery include monte-carlo structure refinement, graph-based residue connectivity of inactive and active states, normal mode analysis, and elastic network modeling. Experimental tools have been employed to study both structure and dynamics of allostery. X-ray crystallography is commonly used to compare structural changes between the inactive and active states. A comprehensive survey comparing the active and inactive state crystal structures of 51 allosteric proteins showed that on an average 20% of a protein undergoes

ACS Paragon Plus Environment

Page 4 of 15

Page 5 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

a local structure change in the allosteric transition15. However, the magnitude of structural change varies significantly among proteins from large domain-level motions to subtle perturbations. However, static images without protein dynamics limits the role of crystallography in allostery. Protein dynamics in the micro-to-millisecond timescale is a defining feature of allostery. Before the advent of sophisticated NMR techniques, Cooper and Dryden postulated using principles of statistical thermodynamics that allostery may occur by protein dynamics alone even in the absence of an observable conformational change16. Estimates show that upon effector binding rigidification of conformational flexibility of only 2%, which may be barely detectable experimentally, generates sufficient free energy for allosteric coupling17. In catabolite activating protein, in the absence of an observable structural change, conformational entropy alone is thought to drive allostery18. Biophysical techniques such as fluorescence resonance energy transfer and deuterium exchange kinetics that are often used to study allostery provide an overall measure of dynamics without the atomic-level structural details. NMR is a versatile tool to study structural and dynamical properties of allosteric proteins at high resolution. Relaxation dispersion (RD) is a standard NMR technique employed to study thermodynamic, kinetic, structural features of allosteric processes in the micro-to-millisecond timescale19. RD maps allosteric pathways by identifying residues with matching RD rates and populations. However, residue motion alone does not imply a functional role in allostery. As proteins are dynamic macromolecules, many regions/subregions move in similar timescales, that may or may not be relevant to allosteric communication. While dynamics offers vital clues, validating the functional contribution of a residue would be a strong evidence of its role in allosteric communication. Emerging Tools Deep mutational scanning: Understanding the relationship between the dynamics and function of individual residues is at the core of elucidating the molecular basis of allostery. This requires understanding how perturbation of function affects the residue-level dynamic profile of the entire protein. This approach was elegantly used by Holliday et. al., to map the allosteric role of a residue distal to the active site of the enzyme, cyclophilin A20. They compared dynamic profiles of the wild-type enzyme and a distal residue mutant by NMR and related the activation of different allosteric pathways to change in catalytic activity. This example illustrated the power of mutagenesis to relate function and dynamics at the molecular level. However, it would be difficult to predict if a given mutation would affect dynamics of allostery. But by screening for mutations that affect function, one can study dynamical properties of a subset of mutants by NMR to obtain a deeper atomic insight. Deep mutational scanning is an emerging technology to study proteins through massively parallel functional assays21,22. Deep mutational scanning leverages advances in DNA synthesis and sequencing to comprehensively mutagenize and sequence large protein libraries. By quantifying the activity of each mutant with a massively multiplexed screen or genetic selection, the functional role of individual residues can be determined. Deep mutational scanning can be used to determine residues functionally important for allosteric hotspots by mutations that constitutively lock the protein in active or inactive states. By tipping the thermodynamic balance, some mutations stabilize either the active or the inactive state, revealing the functional role of the residue in allostery. In heroic and pioneering work using limited mutational techniques at the time, the Miller group discovered allosteric hotspots of the Lac repressor by clonally testing 4000 point mutants23. More recently, Lac repressor point mutants were functionally assayed by

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

deep mutational scanning to build a more comprehensive catalog of allosteric hotspots24. With the ease of deep mutational scanning, comparison of allosteric hotspots across multiple homologous proteins is now possible to study patterns of allostery in a family. Deep mutational scanning can systematically uncover long-distance interacting pairs by screening for double mutants that restore normal allosteric function of either a constitutively active or a constitutively inactive single mutant. The function-restoring double mutants would define a putative allosteric pathway linking residues between the effector-binding and active sites. Secondary allosteric pathways that may not be visible to NMR, which may be conditionally activated in a mutant variant when the major pathway is disrupted, would come to light in mutational screen. Comparison of the dynamic profiles of the wild-type and double mutants could show different allosteric networks in proteins which are phenotypically identical but propagate allosteric signal in different ways. This approach synergizes the power of deep mutational scanning to rapidly evaluate the functional landscape with the power of NMR to provide a deeper understanding of structural and dynamical features of allosteric pathways. Machine learning: A fundamental test of our understanding of allostery would be to predict the allosteric pathway in a protein with no functional, dynamic, or structural data. Machine learning is a powerful tool to extract common features of allosteric residues from large sequence-function datasets, such as those generated by deep mutational scanning. Machine learning can be employed in two ways: to understand patterns within a family (residue position specific) or the arguably more challenging task of determining hotspots in any allosteric protein (residue position independent). To develop general rules, a machine learning model can be developed that links mutational phenotype to local structural properties around the mutated residue. Local structural properties would include descriptors like hydrogen-bonds, electrostatics, van der Waals, and residue dynamics from MD simulations. Large datasets can reveal common structural patterns that affect allosteric function. Using limited mutational data, Demerdash et. al., demonstrated the ability of structure-based machine learning to predict allosteric hotspots. After training their model on a diverse set of allosteric proteins, they successfully predicted allosteric hotspots of two unrelated proteins, Lac repressor and myosin25. As additional deep mutational scanning datasets of allosteric proteins become available, the accuracy, resolution, and generalizability of predictions would improve. Applications in biomedicine Discovery of novel allosteric sites: G-protein coupled receptors, nuclear receptors, ion channels, and kinases are all allosteric proteins that together account for 44% of all human protein drug targets26. Discovery of novel allosteric sites is a major pharmacological goal to improve the selectivity and efficacy of drugs. This has motivated the development of new computational tools to predict allosteric sites (PARS27, SPACER28 and MCPath29). Allosteric sites can be discovered by screening a dysfunctional allosteric protein for mutations at exposed sites that modulate function. Therapeutic drugs that target these exposed sites can then be designed. Functional consequences of disease mutations: Sequencing cancer patient samples is revealing mutations in disease-relevant allosteric proteins at a staggering pace30. However, we lack effective methods to determine the functional

ACS Paragon Plus Environment

Page 6 of 15

Page 7 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

effects of these mutations. Locating the driver mutation(s), among several passenger mutations, is notoriously difficult, and requires tedious cell-based assays of individual mutants. A big step toward precision medicine is determining the functional consequences of disease-associated mutations, understanding common molecular mechanisms across different sets of mutants, and developing therapy strategies tailored to mutational subtypes. Current methods to predict driver mutations based largely on frequency of recurrence of a mutation in cancer genomes often fail to identify less common driver mutations31. An alternative approach that may be less biased is a predictive model to assess the impact of a mutation on protein function, such as disruption of allostery or loss of stability, based on its structural context. Such a model would be able to score a new mutation on the basis of structural heuristics derived from deep mutational scanning of other proteins. This would enable rapid identification and functional classification of driver and passenger mutations for follow up experimental studies.

DESIGN OF ALLOSTERY Designing allostery is key to our efforts to engineer biology because allosteric proteins play a major role as enzymes, sensors, regulators, and signaling proteins. The difficulty in designing allosteric proteins with new molecular recognition, such as ligand or DNA binding, is that allosteric hotspots are inadvertently mutated resulting in loss of switch-like ON/OFF activity24. Yet, protein engineers have used directed evolution to successfully design allosteric proteins by screening away inactive mutants. Improving rational approaches to allosteric protein design will require a better understanding of the role of individual residues in molecular recognition, allostery and stability, and their interdependencies. This information enables mutagenesis of targeted residues to change specificity or allosteric mechanism without global disruption of function. In this section, we will briefly review previous work on designing allosteric proteins. We order them on a large-to-small scale of structural modification: whole protein fusion, domain insertion, segment swaps and point mutation. These examples illustrate a tradeoff: designing allostery through large structural modifications like protein fusion or domain insertion require little knowledge of the internal allosteric mechanism within the protein or domain. While this simplifies design, the function of the new protein is constrained by the natural preferences of the fused protein or inserted domain. On the other hand, targeted mutations offer greater design programmability but require a deeper understanding of the allosteric mechanism or alternatively an effective screening strategy (Fig. 1). We conclude the section with tools to enable programmable design of allostery: this involves incorporation of information on allosteric hotspots to guide design and building of a feedback mechanism to learn from successes and failures of improve the next round of design.

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1: Programmability of design vs. scale of structural perturbation: Larger perturbations, such as whole protein fusion or domain insertion, co-opt natural allosteric pathways and recognition specificities which limit the scope of redesigned function. Targeted mutations provide a greater degree of programmable control of new function but require more detailed knowledge of the underlying mechanism or an effective screening strategy.

Whole protein fusion or insertion: An allosteric protein can be fused with a non-allosteric protein in manner that makes them conformationally-coupled. As a result, effector-dependent activation of the allosteric protein indirectly regulates the non-allosteric protein too. A fusion of maltose-binding protein (MBP) and β-lactamase showed that the allosteric properties of the MBP can be transmitted to βlactamase, allowing β-lactamase activity to be regulated by the presence of maltose32. Interestingly, in a follow up study, one of the mutants was found to be allosterically regulated by zinc, although neither wild-type parental proteins bound to zinc33. This suggests that new effector specificity and allosteric activity can simultaneously arise in evolution. Another example is the fusion of LOV protein, a protein that changes conformation when exposed to blue light, with DNA binding domain of Trp repressor to generate a light-responsive transcription factor34. Domain insertion:

ACS Paragon Plus Environment

Page 8 of 15

Page 9 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Exchanging domains between homologous proteins has been shown to alter specificities while maintaining allosteric function. This approach takes advantage of the evolutionary compatibility of allosteric communication between the swapped domain and rest of the parent protein. Swapping DNA binding domains between homologous transcription factors in LacI/GalR family or mammalian nuclear receptors gave allosterically functional chimeras with new ligand/DNA specificity35–37. Evolutionarily unrelated protein domains have also been allosterically linked by choosing an optimal protein-protein interface between them. Ranganathan et. al., rationally linked allosteric pathways across dihydrofolate reductase and a light sensing domain called Per/Art/Sim by identifying surface positions that were functionally coupled to the active site of each protein38. Interaction across the surface positions in each protein led to functional variants with light-modulated reductase activity. Oakes et. al., prescanned Cas9 to locate permissive positions that could accommodate protein domains without loss of Cas9 binding or cleavage functions. By inserting a ligand-responsive allosteric domain at the permissive position, they designed small-molecule inducible Cas939. Segment Swaps: Swapping short segments, generally less than 20 amino acids, among evolutionaryrelated proteins can generate chimeric allosteric proteins with new function such as novel specificity, greater stability or higher catalytic efficiency. Selecting appropriate crossover locations to swap segments is important to maintain a folded structure and allostery. The SCHEMA method selects crossover locations based on structural blocks with native-like contacts resulting in preservation of native-like allosteric pathway. This approach has been used to generate libraries of chimeras with distinct functions by swapping segments from multiple homologous sequences40,41. Despite little understanding of the molecular interplay of residues required to make a functional cytochrome P450, the recombination of sequences from multiple parental P450’s led to a library that contains many folded, active proteins. Point Mutations: Point mutations provide a high degree of programmable control in designing specific molecular interactions or modifying allosteric properties. Current strategy relies largely on highthroughput screening because rational design of allostery remains challenging. Recently, an allosteric transcription factor was redesigned to respond to new inducers by screening a large number of computationally designed sequences24. Because the design algorithm optimizes protein-ligand interaction without accounting residues important for allostery, nearly 80% sequences were allosterically dysfunctional. Directed evolution is highly effective, requiring no knowledge of underlying allosteric processes because only the allosterically active mutants with the desired new function are enriched. However, some studies have shown that that allosteric network can be rationally engineered by mutating key residues involved signal propagation. Two mutations of Tet repressor radically reversed the allosteric mechanism of TetR from releasing DNA to binding DNA upon effector-binding42. In another example, the β-subunit of tryptophan synthase which requires the α-subunit for catalytic activity was modified by transferring mutations from a standalone β-subunit43. This resulted in activation of the β-subunit by triggering a conformational that it would normally adopt only in the presence of the α-subunit. Wu et. al., used structure-guided mutagenesis to constitutively activate Baeyer-Villiger monooxygenase by mimicking effector-induced domain motion44. Based on the crystal structure, they mutated sites distal to the active site that led to mutants with altered specificity toward

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

substituted cyclohexanones. Other studies have used evolutionary record on amino acid variation at every position explored within the constraints of function to rationally choose residues to mutate. From multiple sequence alignments of two-component signaling proteins, Skerker et al. identified pairs of co-varying residues across a protein-protein interface, which they used to rewire signaling pathway of a histidine kinase to phosphorylate a non-cognate response regulator45. In an analogous approach, ancestral protein reconstruction was used to examine changes in amino acid composition over evolutionary time, leading to the identification of three residues that change estrogen receptor’s affinity for the estrogen response element to other steroid response elements without disrupting allostery46. Tools for enabling allostery design Deep mutational scanning: Computational protein design is a powerful tool to optimize local interactions within a protein but does not factor in systemic allosteric effects. Since some residues are involved in both molecular recognition and allostery, the design algorithm inadvertently mutates these hotspots to improve local interactions, but at the cost of disrupting allosteric activity. Deep mutational scanning prior to design can identify positions highly intolerant to substitutions and that can be used to build an exhaustive map of allowable mutations at every position. Further, epistatic dependencies between residues can be determined by exhaustive two- or three-residue combinatorial scans around a binding pocket. This information can then help guide design by biasing the choice of a mutations toward those that are least likely to disrupt allostery. Phylogenetic analysis: Mining phylogenetic trees for amino acid conservation and coevolution is a powerful approach to infer residues important for allostery and their connectivity. Coevolution is particularly useful in determining long-range residue-residue coupling. Often, mutating residues far from the active site influences activity. These distal residues play an important role in potentiating new function. Directed evolution of enzymes is abound with examples of mutations far from the active site that affect substrate specificity or kinetics. Mutating targeted distal residues could induce structural changes that may ease design of new function. The growing volume of protein sequences in databases will improve the accuracy of the method and applicability to different protein families. Design of molecular networks: A hallmark of native proteins is the elaborate hydrogen bond or electrostatic network of residues. Studies have shown that enthalpy differences arising from rearrangements of electrostatic networks can drive allosteric conformation change47. Since computational design decomposes energy into a sum of pairwise interactions, designing such multibody networks has been traditionally challenging. However, new computational tools are able to recapitulate native protein-like hydrogen-bond networks in de novo design proteins48. This would enable the design of proteins with allosteric control by programming hydrogen-bond rearrangements in a step-wise manner from the inactive to the active state. Machine learning:

ACS Paragon Plus Environment

Page 10 of 15

Page 11 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Improving design is an iterative process that requires a framework to learn and incorporate lessons from previous design attempts. The learn step is critical because there may be mutational patterns associated with successful and failed designs that can be incorporated into the next round of design. A machine learning model based on a small but well characterized set of 129 enzyme mutants could accurately predict mutations important for stability and kinetics49. High-throughput screening of allosteric protein libraries will provide tens of thousands of examples of successful and failed designs that will serve as a rich resource for extracting design heuristics with high accuracy.

Figure 2: A design-build-test-learn methodology for allosteric protein design: A design strategy guided by allosteric hotspot information and feedback mechanism to successive rounds of design.

CONCLUSIONS: A complete understanding of allostery would require a multi-pronged approach integrating structure, function and dynamics. Structure alone cannot explain dynamical pathways, dynamics alone does not explain functional role, and function alone does not provide insight into molecular basis. Deep mutational scanning can be envisioned as a tool for comprehensive functional characterization to complement in-depth structural and biophysical studies of select variants chosen based on functional properties. However, deep mutational scanning is restricted to proteins whose activity can be linked to a high-throughput screen or a genetic selection. While this approach has been shown to be generalizable for many types of proteins50–52, more effort is needed to developing screens specifically for allosteric proteins. To improve design of allosteric

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

proteins, we propose a design-build-test-learn methodology in which computational design is supplemented by experimental and phylogenetic data facilitate effective navigation of design space, and lessons from every design cycle feeds back to improve the next iteration (Fig. 2). The current state-of-the-art of allostery design is engineering natural allosteric proteins with new functionalities. Although undoubtedly challenging, this approach largely relies on preserving the allosteric machinery of the starting protein. However, in the last decade, giant strides have been made in the design of new proteins from scratch based on fundamental rules of protein structure. Inspired by these advances, we suggest that the next frontier could be de novo design of allosteric proteins.

Acknowledgements: I would like to thank Megan Leander and Kyle Nishikawa for critical review of this manuscript and Robin Davies for helping with figures.

Funding Sources: This work was supported by a grant from the Army Research Laboratory (W911NF-17-1-0043) and the Shaw Scientist Award from the Greater Milwaukee Foundation.

References (1) Laskowski, R. A., Gerick, F., and Thornton, J. M. (2009) The structural basis of allosteric regulation in proteins. FEBS Lett. 583, 1692–1698. (2) Kuriyan, J., and Eisenberg, D. (2007) The origin of protein interactions and allostery in colocalization. Nature 450, 983–990. (3) MONOD, J., WYMAN, J., and CHANGEUX, J. P. (1965) ON THE NATURE OF ALLOSTERIC TRANSITIONS: A PLAUSIBLE MODEL. J. Mol. Biol. 12, 88–118. (4) Koshland, D. E., Némethy, G., and Filmer, D. (1966) Comparison of experimental binding data and theoretical models in proteins containing subunits. Biochemistry 5, 365–385. (5) Nussinov, R. (2016) Introduction to Protein Ensembles and Allostery. Chem. Rev. 116, 6263– 6266. (6) Hilser, V. J. (2010) An Ensemble View of Allostery. Science 327, 653–654. (7) Hidalgo, P., and MacKinnon, R. (1995) Revealing the architecture of a K+ channel pore through mutant cycles with a peptide inhibitor. Science 268, 307–310. (8) Sadovsky, E., and Yifrach, O. (2007) Principles underlying energetic coupling along an allosteric communication trajectory of a voltage-activated K+ channel. Proc. Natl. Acad. Sci. U. S. A. 104, 19813–19818. (9) Gandhi, P. S., Chen, Z., Mathews, F. S., and Di Cera, E. (2008) Structural identification of the pathway of long-range communication in an allosteric enzyme. Proc. Natl. Acad. Sci. U. S. A. 105, 1832–1837. (10) Nussinov, R., and Tsai, C.-J. J. (2013) Allostery in disease and in drug discovery. Cell 153, 293–305. (11) Schlitter, J., Engels, M., and Krüger, P. (1994) Targeted molecular dynamics: a new approach for searching pathways of conformational transitions. J. Mol. Graph. 12, 84–89. (12) Lockless, S. W., and Ranganathan, R. (1999) Evolutionarily conserved pathways of

ACS Paragon Plus Environment

Page 12 of 15

Page 13 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

energetic connectivity in protein families. Science 286, 295–299. (13) Süel, G. M., Lockless, S. W., Wall, M. A., and Ranganathan, R. (2002) Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat. Struct. Biol. 10, 59–69. (14) Lisi, G. P., Manley, G. A., Hendrickson, H., Rivalta, I., Batista, V. S., and Loria, J. P. (2016) Dissecting Dynamic Allosteric Pathways Using Chemically Related Small-Molecule Activators. Structure 24, 1155–1166. (15) Daily, M. D., and Gray, J. J. (2007) Local motions in a benchmark of allosteric proteins. Proteins 67, 385–399. (16) Cooper, A., and Dryden, D. T. F. (1984) Allostery without conformational change - A plausible model. Eur. Biophys. J. 11, 103–109. (17) Kern, D., and Zuiderweg, E. R. P. (2003) The role of dynamics in allosteric regulation. Curr. Opin. Struct. Biol., 13, 748-57 (18) Tzeng, S.-R., and Kalodimos, C. G. (2012) Protein activity regulation by conformational entropy. Nature 488, 236–240. (19) Farber, P. J., and Mittermaier, A. (2015) Relaxation dispersion NMR spectroscopy for the study of protein allostery. Biophys. Rev. 7, 191–200. (20) Holliday, M. J., Camilloni, C., Armstrong, G. S., Vendruscolo, M., and Eisenmesser, E. Z. (2017) Networks of Dynamic Allostery Regulate Enzyme Function. Structure 25, 276–286. (21) Fowler, D. M., and Fields, S. (2014) Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807. (22) Fowler, D. M., Araya, C. L., Fleishman, S. J., Kellogg, E. H., Stephany, J. J., Baker, D., and Fields, S. (2010) high-resolution mapping of protein sequence- function relationships. Nat. Methods 7, 741–746. (23) Suckow, J., Markiewicz, P., Kleina, L. G., Miller, J., Kisters-Woike, B., and Müller-Hill, B. (1996) Genetic studies of the Lac repressor. XV: 4000 single amino acid substitutions and analysis of the resulting phenotypes on the basis of the protein structure. J. Mol. Biol. 261, 509– 523. (24) Taylor, N. D., Garruss, A. S., Moretti, R., Chan, S., Arbing, M. A., Cascio, D., Rogers, J. K., Isaacs, F. J., Kosuri, S., Baker, D., Fields, S., Church, G. M., and Raman, S. (2016) Engineering an allosteric transcription factor to respond to new ligands. Nat. Methods 13, 177– 183. (25) Demerdash, O. N., Daily, M. D., and Mitchell, J. C. (2009) Structure-based predictive models for allosteric hot spots. PLoS Comput. Biol. 5, e1000531. (26) Santos, R., Ursu, O., Gaulton, A., Bento, A. P., Donadi, R. S., Bologa, C. G., Karlsson, A., Al-Lazikani, B., Hersey, A., Oprea, T. I., and Overington, J. P. (2017) A comprehensive map of molecular drug targets. Nat. Rev. Drug Discov. 16, 19–34. (27) Panjkovich, A., and Daura, X. (2014) PARS: a web server for the prediction of Protein Allosteric and Regulatory Sites. Bioinformatics 30, 1314–1315. (28) Goncearenco, A., Mitternacht, S., Yong, T., Eisenhaber, B., Eisenhaber, F., and Berezovsky, I. N. (2013) SPACER: Server for predicting allosteric communication and effects of regulation. Nucleic Acids Res. 41, W266--72. (29) Kaya, C., Armutlulu, A., Ekesan, S., and Haliloglu, T. (2013) MCPath: Monte Carlo path generation approach to predict likely allosteric pathways and functional residues. Nucleic Acids Res. 41, W249--55. (30) Cancer Genome Atlas Research Network, Weinstein, J. N., Collisson, E. A., Mills, G. B.,

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Shaw, K. R. M., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., and Stuart, J. M. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120. (31) Raphael, B. J., Dobson, J. R., Oesper, L., and Vandin, F. (2014) Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Med. 6, 5. (32) Guntas, G., Mansell, T. J., Kim, J. R., and Ostermeier, M. (2005) Directed evolution of protein switches and their application to the creation of ligand-binding proteins. Proc. Natl. Acad. Sci. U. S. A. 102, 11224–11229. (33) Liang, J., Kim, J. R., Boock, J. T., Mansell, T. J., and Ostermeier, M. (2007) Ligand binding and allostery can emerge simultaneously. Protein Sci. 16, 929–37. (34) Strickland, D., Moffat, K., and Sosnick, T. R. (2008) Light-activated DNA binding in a designed allosteric protein. Proc. Natl. Acad. Sci. U. S. A. 105, 10709–10714. (35) Tungtur, S., and Parente, D. J. (2011) Functionally important positions can comprise the majority of a protein’s architecture. Proteins. 79, 1589-608 (36) Meinhardt, S., Manley, M. W., Parente, D. J., and Swint-Kruse, L. (2013) Rheostats and Toggle Switches for Modulating Protein Function.PLoS One, 8(12):e83502 (37) Connaghan, K. D., Miura, M. T., Maluf, N. K., Lambert, J. R., and Bain, D. L. (2013) Analysis of a glucocorticoid-estrogen receptor chimera reveals that dimerization energetics are under ionic control. Biophys. Chem. 172, 8–17. (38) Lee, J., Natarajan, M., Nashine, V. C., Socolich, M., Vo, T., Russ, W. P., Benkovic, S. J., and Ranganathan, R. (2008) Surface sites for engineering allosteric control in proteins. Science 322, 438–442. (39) Oakes, B. L., Nadler, D. C., Flamholz, A., Fellmann, C., Staahl, B. T., Doudna, J. A., and Savage, D. F. (2016) Profiling of engineering hotspots identifies an allosteric CRISPR-Cas9 switch. Nat. Biotechnol. 34, 646–651. (40) Otey, C. R., Silberg, J. J., Voigt, C. A., Endelman, J. B., Bandara, G., and Arnold, F. H. (2004) Functional evolution and structural conservation in chimeric cytochromes p450: calibrating a structure-guided approach. Chem. Biol. 11, 309–318. (41) Otey, C. R., Landwehr, M., Endelman, J. B., Hiraga, K., Bloom, J. D., and Arnold, F. H. (2006) Structure-guided recombination creates an artificial family of cytochromes P450. PLoS Biol. 4, e112. (42) Kamionka, A., Bogdanska-Urbaniak, J., Scholz, O., and Hillen, W. (2004) Two mutations in the tetracycline repressor change the inducer anhydrotetracycline to a corepressor. Nucleic Acids Res. 32, 842–847. (43) Murciano-Calles, J., Romney, D. K., Brinkmann-Chen, S., Buller, A. R., and Arnold, F. H. (2016) A Panel of TrpB Biocatalysts Derived from Tryptophan Synthase through the Transfer of Mutations that Mimic Allosteric Activation. Angew. Chem. Int. Ed. Engl. 55, 11577–11581. (44) Wu, S., Acevedo, J. P., and Reetz, M. T. (2010) Induced allostery in the directed evolution of an enantioselective Baeyer-Villiger monooxygenase. Proc. Natl. Acad. Sci. U. S. A. 107, 2775–2780. (45) Skerker, J. M., Perchuk, B. S., Siryaporn, A., Lubin, E. A., Ashenberg, O., Goulian, M., and Laub, M. T. (2008) Rewiring the specificity of two-component signal transduction systems. Cell 133, 1043–1054. (46) McKeown, A. N., Bridgham, J. T., Anderson, D. W., Murphy, M. N., Ortlund, E. A., and Thornton, J. W. (2014) Evolution of DNA specificity in a transcription factor family produced a new gene regulatory module. Cell 159, 58–68.

ACS Paragon Plus Environment

Page 14 of 15

Page 15 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(47) Kumawat, A., and Chakrabarty, S. (2017) Hidden electrostatic basis of dynamic allostery in a PDZ domain. Proc. Natl. Acad. Sci. U. S. A. 114, E5825--E5834. (48) Boyken, S. E., Chen, Z., Groves, B., Langan, R. A., Oberdorfer, G., Ford, A., Gilmore, J. M., Xu, C., DiMaio, F., Pereira, J. H., Sankaran, B., Seelig, G., Zwart, P. H., and Baker, D. (2016) De novo design of protein homo-oligomers with modular hydrogen-bond networkmediated specificity. Science 352, 680–687. (49) Carlin, D. A., Hapig-Ward, S., Chan, B. W., Damrau, N., Riley, M., Caster, R. W., Bethards, B., and Siegel, J. B. (2017) Thermal stability and kinetic constants for 129 variants of a family 1 glycoside hydrolase reveal that enzyme activity and stability can be separately designed. PLoS One 12, e0176255. (50) Araya, C., Fowler, D., Chen, W., Muniez, I., Kelly, J., and Fields, S. (2012) A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. 109, 16858–16863. (51) Whitehead, T. A., Chevalier, A., Song, Y., Dreyfus, C., Fleishman, S. J., De Mattos, C., Myers, C. A., Kamisetty, H., Blair, P., Wilson, I. A., and Baker, D. (2012) Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat. Biotechnol. 30, 543–548. (52) Starita, L. M., Pruneda, J. N., Lo, R. S., Fowler, D. M., Kim, H. J., Hiatt, J. B., Shendure, J., Brzovic, P. S., Fields, S., and Klevit, R. E. (2013) Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis. Proc. Natl. Acad. Sci. U. S. A. 110, E1263--72.

For Table of Contents Use Only

ACS Paragon Plus Environment