Perspective pubs.acs.org/jmc
Elements and Modulation of Functional Dynamics Alan C. Gibbs* Janssen Pharmaceutical Research and Development, LLC, Welsh and McKean Road, Spring House, Pennsylvania 19477-0776, United States ABSTRACT: The existing structure−function paradigm of drug discovery has been evolving toward the essential incorporation of dynamics data. This new functional dynamics paradigm emphasizes conformational entropy as a driving force of protein function and intermolecular recognition. Conformational dynamics (a proxy of conformational entropy) impacts the degree of protein (dis)order and the constitution of the conformational ensemble, the mechanisms of allostery and drug resistance, and the free energy of ligand binding. Specific protein and ligand conformations facilitate favorable, reciprocal interactions. The number of protein and ligand conformers that exhibit favorable binding interactions will vary from system to system. All binding scenarios can modulate protein dynamics by various levels of enthalpic and entropic contribution, with significant influence on the functional dynamics of the system. Analysis and consideration of resulting changes of activity, signaling, catalysis, and subsequent phenotypic outcome are powerful motivations in the drug design process.
■
INTRODUCTION The present era of macromolecular structure determination began in 1958 with the first protein (myoglobin) crystal structure by X-ray crystallography1 and has since produced a wealth of data.2 Thousands of structures have been deposited in the Protein Data Bank (PDB) and describe, in fascinating detail, the complex structural architecture and inner workings of a diverse range of biomolecules. Often, noncovalent interactions in molecular complexes reveal important aspects of biological pathways and functions. These data are instrumental for a variety of applications and have been successfully implemented in drug discovery by structure-based drug design methods. Unfortunately, one negative consequence of the explosion of crystallographic structural data is the inadvertent suppression of a key essence of biomolecules, they have motion. The static view of macromolecules was alluded to over 30 years ago by Phillips, “...the period in 1965−75 may be described as the decade of the rigid macromolecule. Brass models of double helical DNA and a variety of protein molecules dominated the scene and much of the thinking”, which persisted for some time thereafter.3 Despite this, it has been known since the first crystal structure of myoglobin was reported that molecules are dynamic, not static, as was described experimentally with hydrogen/deuterium exchange experiments for the same molecule.4 Indeed, it is now known that a hierarchy of motional time scales exist for a protein in solution. Motions on the second time scale usually involve global translation and diffusion. Motions on the micro- to millisecond time scale correspond to domain motions, are often functionally relevant, and may be indicative of energetically excited conformational states.5,6 Motions on the pico- to nanosecond time scale are important because of the effects on the entropy of the system because of their high frequency.7 © 2014 American Chemical Society
A profusion of biomolecular dynamics studies over the last few decades has advanced the field and uncovered a diverse array of accessible protein motions, detailing their significance to biological function and their relationships to structure.8 As a whole, these studies suggest the relationships between structure and dynamics form the basis of a true biomolecular representation, one that includes three spatial dimensions (structure) and one temporal dimension (dynamics), and an atomic-level characterization that requires all four interdependent dimensions. Functional dynamics describe protein function as it relates to protein dynamics or, alternatively, how motions are involved in the biological functioning of a molecule. More specifically, it is the study of how motion influences functional behavior in processes such as catalysis, communication, recognition, intrinsic disorder, mutational drug resistance, and long-range allosteric signaling to name a few. A variety of determinants can positively or negatively influence native functional dynamics of a given protein. Determinants such as sequence composition undoubtedly have the most influence, with contributions from environmental factors like molecular crowding, temperature, and pH; covalent and noncovalent (alternative folding) posttranslational modifications; and finally ortho- and allosteric ligand binding, which will be discussed herein. Modulation of functional dynamics by small molecule binding is a complementary, if not alternative, strategy to competitive obstruction of the catalytic site as a means of inhibition in drug discovery.9 The goal of this strategy lies in the “tuning” of small molecules to elicit specific dynamic responses in protein partners, which corresponds to favorable functional outcomes. In order to leverage modulation of functional dynamics as an effective drug discovery methodReceived: February 28, 2014 Published: June 10, 2014 7819
dx.doi.org/10.1021/jm500325k | J. Med. Chem. 2014, 57, 7819−7837
Journal of Medicinal Chemistry
Perspective
Figure 1. Time scale of molecular motion (seconds). Double-headed arrows show approximate range of individual motions. Single-headed arrows indicate direction of increasing amplitude and frequency.
fluctuation fit, of molecular recognition between enzyme and substrate.19 A review on the subject by Závodszky and Hajdú20 highlights Straub’s personal description of fluctuation fit: “instead of a fit induced by the substrate, I would suggest a fluctuating enzyme molecule, one particular conformation of which is able to bind the substrate and getting stabilized by it, while others do not”. Modest refinement of the fluctuation fit model defines a popular notion of macromolecular dynamics, which posits that at thermal equilibrium multiple conformations exist at a given time, including conformations that bind and are complementary to substrate. Upon substrate binding, a conformer population shift, or redistribution, of substates occurs, where enzyme conformers that bind substrate are removed from the gross population, causing a population shift toward these conformers.21 This concept contrasts the induced fit model, where conformations that bind substrate exist only in the presence of substrate and are therefore added to the number of conformations accessible to the macromolecule. There has been extensive discussion around the merits of the theories, specifically, induced fit and fluctuation fit (also known as conformational selection).22 Ascribing one of the two models to a particular system is as important to a functional dynamics study as the influence of a proposed binding model is to the accurate description of kinetic and thermodynamic details of a substrate−protein interaction. However, this assignment has not been a trivial task, as examination of structural (atomic coordinate-based) information probably cannot provide conclusive or sufficient evidence with which to differentiate induced fit from fluctuation fit models.23,24 Multiple conformations of an apo form of an enzyme may support conformational selection, whereas multiple conformations of holo and apo forms of a protein may support the induced fit model. Nonetheless, the later scenario may be refuted because the bound-like conformer of the apo protein has the potential to pre-exist in solution. In addition, the lack of detected sparsely populated states, which may be important for
ology, accurate characterization and understanding of protein dynamics must come first. A number of biophysical techniques have contributed to the current picture of molecular dynamics. A few examples include vibrational spectroscopy,10 hydrogen/ deuterium exchange (HDX),11 isothermal titration calorimetry (ITC),12 and fluorescence spectroscopy.13 However, rational molecular design requires atomic-level detail, and the theoretical and experimental methods most suited for this task are molecular dynamics (MD)14 simulations, X-ray crystallography,15 and nuclear magnetic resonance (NMR).16 The first section of this Perspective provides some information about specific aspects of protein dynamics implicated in functional dynamics and about the tools and methods associated with identification and characterization. The second section highlights how different ligands can, upon binding, modulate protein dynamics in distinctive ways and how this modulation can affect functional dynamics. The information presented in this Perspective is by no means a comprehensive account of all aspects of functional dynamics and its modulation; an attempt is made to highlight some of the relevant tools, concepts, and key studies. Elements of Functional Dynamics. Molecular Recognition. The precise interaction between two molecules is essential for many biological functions, and the details and characterization of such molecular recognition have been the subject of interest for over a century. The early key−lock theory of Fischer proposed a precondition of complimentarity between enzyme and substrate, where complimentarity is established from rigid, chiral interactions.17 Koshland’s induced fit theory, proposed 60 years later, built on Fischer’s ideas of complimentarity and introduced the concept of a flexible enzyme.18 The induced fit theory is similar to the key−lock theory in putting forth the idea of a rigid enzyme prior to interaction with substrate, but it differs in the hypothesis that a conformational change in the enzyme occurs following interaction with substrate to maximize complimentarity. A few years later, Straub proposed an alternative theory, 7820
dx.doi.org/10.1021/jm500325k | J. Med. Chem. 2014, 57, 7819−7837
Journal of Medicinal Chemistry
Perspective
surface of a pure harmonic system, that enables vibrational energy transfer between high-frequency, localized modes. The transfer of energy by mode coupling is accomplished by Fermi resonances that may occur between spatially overlapping modes. Energy transfer occurs only if frequency matching between the modes is appropriate (i.e., from a higher frequency mode to one of an intermediate frequency).31 From a functional dynamics standpoint, vibrational energy transfer can be involved in a variety of processes such as dissipation of heat from reaction centers, energy transport to modulate the speed of catalysis, long-range communication, and allostery.31 A practical tool to aid study of complex protein (an)harmonic energy surfaces is the potential energy landscape. Frauenfelder proposed that protein dynamics are analogous to motions of spin glasses (disordered magnets) and can be described by rough energy landscapes.32 Protein dynamics occur (at a given thermal equilibrium) over a range of time scales, magnitudes, and directions, resulting in complex multidimensional energy surfaces.8 Conformational energy landscape diagrams simplify the picture to two dimensions (x axis = conformational coordinate; y axis = energy) and are useful as a time-dependent “slice” through the energy surface (Figure 2). The peaks and valleys of the landscape diagram
molecular recognition and ligand binding, does not mean that such excited states do not exist. It has been suggested that the most reliable method of distinguishing the two models is by observing the kinetic signatures of ligand binding.25 Thus, if the rate of association, Kobs, decreases with or is independent of ligand concentration, [L], then the system is indicative of conformational selection, and the induced fit model is unambiguously ruled out. In constrast, when Kobs increases with [L], either model is possible. Although this latter point may seem to contradict previous studies, which state that an increasing Kobs with [L] is proof of induced fit,26 it has been shown that this statement of proof only applies when the rapid equilibrium approximation (that conformational transitions are rate-limiting) is incorrectly assumed. In a recent overview, conformational selection appears to explain the majority of data from the systems examined.27 A combination of induced fit and conformational selection may be most appropriate to describe certain systems. X-ray crystallography- and NMR-derived data suggest that this is the case for calmodulin. Binding of Ca2+ gives calmodulin a propensity to form compact structures, a requirement for binding peptide substrate derived from myosin light chain kinase. Apo structures of calmodulin are unable to form the necessary compact conformers.28 Despite their differences, both the induced fit and conformational selection models require some level of protein flexibility and movement. Protein movement is composed of two implicitly linked motional phenomena, molecular tumbling and Brownian diffusion, and a range of internal motions: sidechain, backbone, and domain movements. The discussion herein will focus on internal motions to include the following: bond stretching, bending, and librations; torsion angle rotation and side-chain rotomer sampling; flexing of loops and lids; and hinge, shear, and rotational motions of subunits. Intramolecular protein motions are dependent on primary sequence, tertiary fold, and environment and are characterized by amplitudes, frequencies (time scales), and energies, which occur roughly in the following ranges: amplitude 0.01−100 Å, energy 0.1−100 kcal/mol, and time 10−15 to 103 seconds, as detailed in Figure 1.29 The Energy Landscape. All proteins contain vibrational degrees of freedom at thermal equilibrium (Bolztmann energy). Normal modes describe vibrational degrees of freedom by simple harmonic oscillations. The number of normal modes for a nonlinear molecule is given by 3N − 6, where N = the number of atoms. For example, a protein with 300 residues with an average of 15 atoms/residue will have approximately 4500 total atoms. Because each atom has three degrees of freedom, the protein will have well over 13 000 vibrational degrees of freedom, or normal modes. Normal modes are defined by frequency and amplitude, with high-frequency modes often associated with low-amplitude local vibrations. Conversely, low-frequency modes are usually global, correspond to high amplitudes, and cause larger motions such as domain movements. Some modes are harmonic in nature, while others, such as the low-to-medium frequency modes, which account for the majority, are anharmonic.30 Protein dynamics are described using a superposition of both harmonic and anharmonic modes. Anharmonic modes are considered to be the most functionally relevant because their motions cause atomic displacement. Also, it is anharmonicity, a deviation from the parabolic energy
Figure 2. Hypothetical energy landscape diagrams for three different proteins. Each landscape is defined by increasing energy on the y axis and a conformational coordinate on the x axis. The proteins, from left to right, are in order of increasing disorder. The peaks correspond to conformational energy barriers between the valleys of conformer populations. Protein A is a typical stable, folded, globular protein. Protein B depicts a molten globular protein with increasing disorder. Protein C illustrates a protein lacking a stable structure (i.e., an intrinsically disordered protein).
depict the energy surface ruggedness, an indicator of the conformational heterogeneity of a given protein. As a rule, landscapes are complex, and depending on the energy scale examined, they may contain multiple local minima within the valleys that dictate the thermodynamic and kinetic parameters of conformational transitions.33 The peaks and valleys illustrate various properties of the conformational equilbria such as relative population sizes, barriers between conformer populations, degree of harmonicity/anharmonicity (small molecules), and time-dependent population shifts where applicable. When the energy barriers between disparate valleys are low relative to the Boltzmann energy (kT), there may be significant populations of all of the conformers.23 The landscape picture describes conformational heterogeneity in terms of energy, whereas the conformational ensemble describes conformational heterogeneity in terms of atomic coordinates. Traditional structural biology methods have provided a wealth of data on highly populated ground (native) state structures. The landscape picture then highlights the conformational distribution that surrounds the ground state, a 7821
dx.doi.org/10.1021/jm500325k | J. Med. Chem. 2014, 57, 7819−7837
Journal of Medicinal Chemistry
Perspective
Figure 3. Degrees of correlated motion between two secondary structural elements. (A) Secondary structures indicate correlated motion between the helix and sheet; the motion is characterized by similar frequency, amplitude, and direction. (B) There is no discernible correlated motion between the helix and sheet. (C) Anticorrelated motion between helix and sheet moving in opposite directions.
proteins. Indeed, predictions suggest that as many as 40% of proteins in the human proteome contain long disordered regions.38 It should be mentioned that the terms stable and unstable, as used above, both describe stable ensembles in the thermodynamic sense. Disordered ensembles, which contain large amounts of conformational entropy, are lower in free energy than a narrow distribution of conformers;39 the converse is true for ordered ensembles, provided that the enthalpy in both cases is identical. Experimentally, the two workhorses for the atomic-level characterization of conformational equilibria are crystallography and NMR. Conventional X-ray crystallography typically elucidates ground-state snapshots of the conformational ensemble. A broader detection of the conformational ensemble using crystallography has been aided by methods such as the use of anisotropic thermal parameters (three positional and six thermal, as opposed to one isotropic parameter for refinement),40 ambient-temperature crystallography,41 Laue diffraction (time-resolved data collection),42 and translation, libration, and screw-axis (TLS) refinement.43 NMR is more ideally suited for the study of molecular dynamics as a precise set of observables in that the methodology can (a) describe the full time scale of conformational exchange and (b) allow the characterization of sparsely populated conformers (populations existing at less than a few percent of total, with lifetimes that span nano- to milliseconds). In some cases, NMR experiments may be designed to report the corresponding exchange kinetics related to the rates of interconversion between substates and corresponding populations. Because of their transient nature, sparsely populated conformers are often not directly observed in the NMR spectrum.44 However, their indirect detection may be achieved using relaxation dispersion (RD),45,46 paramagnetic relaxation enhancement (PRE),47 or chemical-exchange saturation transfer (CEST) methods.48,49 Further characterization of longerlived constituents of disordered ensembles has been extensively studied with residual dipolar couplings (RDC’s), which are sensitive to a wide, pico- to millisecond, time scale of motion.50,51 From a theoretical perspective, MD simulations are the workhorse for atomic-level characterization of the conformational ensemble.14 Advances in computational infrastructure and algorithms have enabled long simulations up to the microsecond(s) time scale of systems in explicit solvent containing tens of thousands of atoms. Indeed, the use of graphics processing unit (GPU) clusters can increase MD simulations from 1 or 2 orders of magnitude over conventional CPU’s,52 and cloud-based computing with, for example,
distribution, guided by dynamics, that often contains functionally relevant substates. Strictly speaking, the conformational ensemble is a dynamics−structure-based representation of the energy landscape. The Conformational Ensemble. Picosecond motions (highfrequency modes) often do not change the conformation of the molecule significantly. In fact, it is important to characterize motions between the nanosecond to second time scales, as it is in this regime that conformational heterogeneity likely exists and conformers of biological importance are often found. In the past, structural biology methods have not been entirely adequate for detecting functionally important conformers. The deficiency is primarily due to the time- and spatialaveraged nature of the methods,34 resulting in the inherent conformational equilibria represented by one, or few, structure. A more appropriate representation of the conformational equilibria is the conformational ensemble, a probability distribution of atomic coordinates. The ensemble is visualized as a superimposition of multiple structures that lie within a set energetic window, with the number of structures being proportional to population weighting. Moreover, the ensemble is governed by the energy landscape and may encompass conformers that are less stable (higher energy), transient, and often present in low populations (