Role of Conformational Flexibility in Monte Carlo Simulations of Many

Jan 11, 2019 - ... University of California, Irvine , Irvine , California 92697 , United States ... move within a predetermined library of discrete pr...
1 downloads 0 Views 3MB Size
Subscriber access provided by EKU Libraries

Biomolecular Systems

The role of conformational flexibility in Monte Carlo simulations of many-protein systems Bibhab Bandu Majumdar, Vera Prytkova, Eric K. Wong, J. Alfredo Freites, Douglas J. Tobias, and Matthias Heyden J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00894 • Publication Date (Web): 11 Jan 2019 Downloaded from http://pubs.acs.org on January 13, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The role of conformational flexibility in Monte Carlo simulations of many-protein systems Bibhab Bandhu Majumdar1, Vera Prytkova2, Eric K. Wong2, J. Alfredo Freites2, Douglas J. Tobias2,*, Matthias Heyden3,* 1

Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, D-45470 Mülheim an der Ruhr, Germany 2 3

Department of Chemistry, University of California, Irvine, Irvine, CA 92697, U.S.A.

School of Molecular Sciences, Arizona State University, Tempe, AZ 85287, U.S.A.

*to whom correspondence should be addressed: [email protected], [email protected]

Abstract: Efficient computational modeling of biological systems characterized by high concentrations of macromolecules often relies on rigid-body Brownian Dynamics or Monte Carlo (MC) simulations. However, the accuracy of rigid-body models is limited by the fixed conformation of the simulated biomolecules. Multi-conformation Monte Carlo (mcMC) simulations of protein solutions incorporate conformational flexibility via a conformational swap trial move within a pre-determined library of discrete protein structures, thereby alleviating artifacts arising from the use of a single protein conformation. Here, we investigate the impact of the number of distinct protein structures in the conformational library and the extent of conformational sampling used in its generation on structural observables computed from simulations of hen egg white lysozyme (HEWL), human gD-crystallin, and bovine gB-crystallin solutions. We find that the importance of specific protocols for the construction of the protein structure library is strongly dependent on the nature of the simulated system.

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction The crowded conditions encountered by biomacromolecules in cellular environments dominate their mobility and stability as well as the nature of their interactions.1 Experimental studies involving biomolecules are mostly carried out in vitro using buffer solutions, often neglecting the effects of macromolecular crowding. Several experimental techniques2–8 have recently become available to study biomolecular interactions in crowded media both in vitro and in vivo. Atomistic simulations of such crowded systems in explicit solvent are still restricted to small numbers of biomolecules due to often prohibitive computational costs.9,10 Computationally inexpensive modeling approaches, such as rigid-body Brownian Dynamics and Monte Carlo (MC) simulations in implicit solvent, can provide a powerful alternative to fully atomistic simulations.11–13 However, the accuracy of rigid-body simulations is limited due to the neglect of conformational fluctuations. Recently, Prytkova et.al.14 described an efficient method to introduce conformational flexibility in MC simulations of many-protein systems. In this approach, termed multi-conformation Monte Carlo (mcMC), protein flexibility is incorporated via a conformational swap trial move from a pre-determined library of structures. In principle, the library of structures used in mcMC simulations can be generated by any suitable experimental or computational method that samples protein conformations. The probability of a conformational swap trial move can be used to describe free energy differences between protein conformations in the infinite dilution limit. The mcMC sampling then takes into account changes in the relative stability of conformations due to inter-protein interactions at increased concentration. While the mcMC approach represents an improvement compared to simulations based on a single conformation rigid-body simulation,14 a limited-size, discrete library of protein structures remains an approximate representation of the configurational space that describes the structure and dynamics of polypeptide chains. A natural question to ask in this context is how many independent samples of protein structures are needed to effectively describe molecular flexibility using a finite library of protein conformations? An associated query relates to the required protocol, which adequately captures the relevant portions of the protein configurational landscape. Here, we address these questions for 2

ACS Paragon Plus Environment

Page 2 of 29

Page 3 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

conformational libraries generated using all-atom molecular dynamics (MD) simulations of individual proteins in explicit solvent under infinite dilution conditions. We perform mcMC simulations of 10 mg/ml solutions of hen white lysozyme (HEWL), with a library of conformations generated from enhanced sampling simulations based on temperature replica exchange, and human gD-crystallin for which conformational libraries were constructed from long 40 µs MD trajectories. In addition, we consider the case of bovine gB-crystallin, which is homologous to gD-crystallin but with no net charge, using 500 ns MD simulation trajectories. We find that structural observables computed from the mcMC simulations converge rapidly for weakly associating systems such as the g-crystallin solutions or for solutions of HEWL under conditions where electrostatic repulsion dominates protein-protein interactions. In contrast, the specifics of the construction of the library of protein conformations become key in the structural characterization of strongly attractive systems such as HEWL solutions at high ionic strength.

Methods Protein-protein interaction potential. We employ the grid-based protein-protein interaction potential introduced by Mereghetti et al.11 for many-protein BD simulations in the SDAMM software package15, which consists of four contributions: the electrostatic interaction energy due to the electrostatic potential of one protein with the charges of a second protein;16 an electrostatic desolvation penalty due to bringing solvated polar groups of one protein into a low dielectric environment upon binding another protein with the simultaneous loss of their solvation shell;17 a short-range attractive non-polar desolvation interaction due to the burial of solvent exposed surface atoms of one protein by a second protein;15 and a soft-core repulsive interaction energy between atoms from different proteins. The two electrostatic contributions depend explicitly on the ionic strength of the solution. The non-polar desolvation interaction is typically modified by a varying scaling factor in the potential, which converts buried protein surface area into an energy. All proteins are described by an atomistic representation to precompute interaction potential terms on space-filling three-dimensional grids with a grid constant of 1 Å and 200 x 200 x 200 grid points.

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The electrostatic potential grids were computed at different ionic strengths (see Supporting Information) with atomic charges according to the OPLS force field18 by finite difference solution of the linearized Poisson-Boltzmann equation using the UHBD19 software package. Dielectric constants of 78.4 and 2.0 were used for the solvent and protein, respectively. A reduced number of charged sites in each protein was used during the evaluation of the electrostatic potential energies within the mcMC simulations using the effective charge approximation16 implemented in the SDAMM software package. SDAMM potential function parameters were employed as reported by Prytkova et al.14, except for the simulations of gB-crystallin, where a nonpolar desolvation parameter of −0.0343 kJ/mol/Å2 was used to match an experimental value of the osmotic second virial coefficient.20 Multi-conformation Monte Carlo simulation. The multi-conformation Monte Carlo (mcMC) algorithm extends the conventional rigid-body simulation scheme by introducing protein conformational flexibility through a trial move that attempts to swap conformations between a protein in the simulation system and a discrete library of protein structures. In this context, the probability of finding the system in the state 𝑖, 𝑝# , in the canonical ensemble is proportional to 𝑝# ~ 𝑤𝑖 exp −

𝐸# 𝑘- 𝑇

(1) where 𝐸# is the corresponding protein-protein interaction energy and 𝑤# is a statistical weight, independent of the Monte Carlo sampling, given by 𝑤# ~ exp −

𝐺# 𝑘- 𝑇

(2) where 𝐺# is the intrinsic free energy of all the protein conformations present in the state 𝑖, which includes all intramolecular and protein-solvent interactions in the infinite dilution limit.

4

ACS Paragon Plus Environment

Page 4 of 29

Page 5 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The underlying transition matrix element corresponding to the conformational swap move from state 𝑖 to state 𝑗, 𝛼#2 , is given by 𝛼#2 = 𝑤2 (3) In other words, in contrast to rigid-body translational and rotational trial moves, the Markov chain underlying the conformational swap trial move consists of independent events. Imposing detailed balance leads to the following acceptance criterion min 1,

𝛼2# 𝑝2 ∆𝐸 = min 1, exp − 𝑘- 𝑇 𝛼#2 𝑝#

(4) where ∆𝐸 = 𝐸2 − 𝐸# , which is identical to the acceptance criterion associated with the rigid-body motion trial moves. Thus, only the ordinary Metropolis21 acceptance criterion is needed in mcMC simulations. In summary, the statistical weights 𝑤# describe the relative probabilities of the individual conformations 𝑖 in dilute solution, which is provided as an input to the mcMC simulation. In the absence of protein-protein interactions, the observed relative probabilities 𝑝# are directly proportional to 𝑤# following Eq. 1 for zero protein-protein interactions 𝐸# . For mcMC simulations of systems with multiple proteins and non-zero interaction energies 𝐸# , the 𝑝# are modulated relative to the statistical weights 𝑤# . When compared to the statistical weight 𝑤# , the probability 𝑝# of conformation 𝑖 as sampled in mcMC simulations therefore reflects the relative stabilization or destabilization free energy, ∆∆𝐺# , due to protein-protein interactions relative to dilute solution conditions ∆∆𝐺# = −𝑅 𝑇 ln

𝑝# 𝑤#

(5)

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The mcMC algorithm has been implemented14 into a customized version of the SDAMM software package.15 In this implementation, a Monte Carlo cycle comprises as many trial moves as the total number of proteins in the system. In a trial move, a rigid-body translation, rigid-body rotation, or conformational swap is chosen at random with equal probability, and applied to a single protein in the simulation system also selected at random with equal probability. The maximum step sizes of rigid-body translational and rotational trial moves are adjusted to yield an acceptance ratio of approximately 50%. In the conformational swap trial move an alternative protein conformation is chosen from the protein structure library with probability given by equation (3). The library of protein conformations and the corresponding probability distribution (the statistical weights 𝑤# ) are sampled from extensive all-atom MD simulations of a single protein in explicit solvent, as described in the next section. All mcMC simulations were carried out at a concentration of 10 mg/ml using 200 protein molecules. The HEWL and gB-crystallin simulations were run at 300 K, and the gDcrystallin simulations were run at 310 K. See Supporting Information for more details. Generation of protein structure libraries. The library of protein structures employed in the mcMC simulation can be generated using several methods. Here, we used a 400-ns replica exchange molecular dynamics (REMD) simulation for HEWL, and conventional equilibrium MD simulation trajectories of 40 µs and 500 ns for gD- and gB-crystallin, respectively, all under infinite dilution conditions. Different simulation protocols and employed force fields for the individual proteins limit direct comparisons between them. Instead, we mainly compare mcMC simulations based on distinct conformational libraries extracted from the same simulation. To generate protein structure libraries, we computed the heavy atom RMSD matrix between protein configurations over selected portions of each trajectory (see Table S1, S2 and S3 of the Supporting Information), and performed a cluster analysis using the algorithm of Daura et al.22, implemented in the GROMACS software package23, with a RMSD cutoff of 1.0 Å for HEWL and gB-crystallin. For gDcrystallin, a larger RMSD cutoff of 1.7 Å was chosen to prevent large numbers of clusters with single structures within the RMSD cutoff of the trajectory. The population of each cluster was used to determine the statistical weights and corresponding transition matrix 6

ACS Paragon Plus Environment

Page 6 of 29

Page 7 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

elements used in the conformational swap trial move to describe the relative stability of each conformation in dilute conditions (cf. equations 2 and 3). A 400 ns REMD simulation was performed in explicit solvent starting from a solution NMR structure of HEWL (PDB ID 1E8L).24 The protonation states of the protein at pH 7 were set using the H++ webserver resulting in a total protein charge of +8.25 A temperature range of 300-375K was chosen and a REMD temperature generator webserver (http://folding.bmc.uu.se/remd/) was used to determine the optimum number of replicas.26 REMD trajectories for 22 replicas were generated using the GROMACS 4.5.6 software package.23 The OPLS all-atom force field18 was used for the protein and ions, and the TIP3P model was used for water27. The system was neutralized with eight chloride ions and solvated by 8819 water molecules. Covalent bonds and the geometry of water molecules were constrained with the LINCS28 and SETTLE28 algorithms, respectively. Short-range interactions were truncated with a 9 Å cut-off and long-range electrostatics were computed using the fast smooth particle-mesh Ewald method29 on a 1.2 Å grid. A time step of 2 fs was used to propagate the system. After an energy minimization, the system was equilibrated for 100 ps at constant temperature and pressure applying harmonic restraints on the protein heavy atom positions using a force constant of 1000 kJ mol-1 nm-2. This was followed by a consecutive 100 ps unrestrained equilibration. During these initial equilibrations, a Berendsen weak coupling thermostat and barostat30 was used with a target temperature of 300 K and a pressure of 1 bar, respectively, with a 1 ps time constant. This equilibrated system was then used to generate the 22 temperature replicas, which were each pre-equilibrated in the NPT ensemble at their respective temperature for 1 ns with a 5 ps time constant for the weak coupling thermostat and barostat. The REMD production simulation was performed with replica exchange attempts every 200 fs. Therein, the Nosé-Hoover thermostat31 and the Parrinello-Rahman barostat32 were used for temperature and pressure coupling with 5 ps period times. The gD-crystallin simulation system was built from the crystal structure reported by Basak et al.33 (PDB ID code 1HK0). The solution state NMR structure of the gD-crystallin P23T variant (PDB ID code 2KFB)34 was used to determine the histidine protonation states for wild-type gD-crystallin. The gB-crystallin simulation system was built from the crystal 7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

structure reported by Kumaraswamy et al.35 (PDB ID code 1AMM) with protonation states consistent with neutral pH. The CHARMM3636 force field was used for protein and ions, and the TIP3P model was used for water27. The proteins were solvated in a cubic 80 Å x 80 Å x 80 Å box with the nearest box boundary at least 12 Å from the protein. Crystal structure waters were preserved and the gD-crystallin system was neutralized with chloride counterions. The resulting systems contain 48,309 and 48,222 atoms for gDcrystallin and gB-crystallin, respectively. The VMD 1.9.1 software package37 was used to assemble the systems. The initial part of the gD-crystallin simulation was first carried out for 20 ns using the NAMD 2.9 software package38 following 10,000 steps of conjugate-gradient energy minimization and a 200 ps MD simulation during which harmonic positional restraints placed on each protein heavy atom were gradually relaxed. The gB-crystallin simulation was run for 500 ns using the NAMD 2.11 software package, with a pre-equilibration consisting of 500 steps of conjugate-gradient energy minimization and a 500 ps harmonically restrained run. Both simulations were run at constant temperature (300 K for gB-crystallin and 310 K for gD-crystallin) and pressure (1 atm). The smooth particle mesh Ewald method29,39 was used to calculate electrostatic interactions. Short-range, real-space interactions were cut off at 11 Å by means of a switching function. A reversible, multiple time-step algorithm40 was used to integrate the equations of motion for the gBcrystallin system with a time step of 4 fs for electrostatic forces, 2 fs for short-range nonbonded forces, and 1 fs for bonded forces. All bond lengths involving hydrogen atoms were held fixed using the SHAKE41 and SETTLE28 algorithms. A Langevin dynamics scheme was used for temperature control, and a Nosé-Hoover-Langevin piston was used for pressure control.42,43 The gD-crystallin simulation was extended for another 46-μs on the Anton 2 supercomputer, a special-purpose computer for molecular dynamics simulations of biomolecules44. The CHARMM36 force field36 was used for protein and ions, and the TIP3P model was used for water27. The reversible, multiple time-step algorithm (RESPA)45 was used to integrate the equations of motion every three timesteps for longrange non-bonded forces, and every timestep for short-range nonbonded and bonded 8

ACS Paragon Plus Environment

Page 8 of 29

Page 9 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

forces. The k-Gaussian split Ewald method46 was used for long-range electrostatic interactions. All bonds lengths involving hydrogen atoms were fixed using the SHAKE41 algorithm. The simulation was performed at constant temperature (310 K) and pressure (1 atm) using Nose-Hoover chains47 and the Martyna-Tobias-Klein barostat42. The RESPA algorithm and temperature and pressure controls were implemented using the multigrator scheme, allowing the simulation to run with a 2.5 fs timestep48. The last 40 μs of this trajectory were used in the generation of the protein structure libraries for mcMC. The use of conformational libraries generated in simulations of individual proteins in dilute solution implies the approximation that all structures relevant for the formation of multimers are also sampled in the absence of protein-protein interactions. The latter can be problematic for induced-fit mechanisms, in which specific conformations are only populated to a measureable extent upon complex formation. In such cases, the simulation protocol for the generation of the conformational libraries would need to be modified.

Results and Discussion As described above, protein conformational flexibility is introduced in mcMC simulations by means of conformational swaps between structures drawn from a discrete conformational library, which was generated from all-atom MD simulations in infinite dilution conditions. To assess how the specifics of the library generation protocol affect structural properties of protein solutions simulated with mcMC, we performed simulations of protein solutions at a mass concentration of 10 mg/ml containing 200 HEWL, gD- or gB-crystallin proteins. The underlying corresponding conformational libraries were generated from a 400-ns REMD simulation, a 40-µs trajectory and a 500-ns MD simulation trajectory, respectively, as outlined in the previous section. Figure 1 illustrates the influence of the library size N on protein-protein interactions in HEWL solutions at various ionic strengths using radial distribution functions g(r) computed from protein center coordinates. The observations are strongly dependent on the ionic strength. Therefore, due to relatively weak electrostatic screening, electrostatic repulsion dominates at low ionic strengths. As a result, only weak association of the proteins is observed at the lowest ionic strength of 100 mM as indicated by the small amplitude of 9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Protein-protein radial distribution functions for HEWL solutions at a concentration of 10 mg/ml obtained in mcMC simulations with varying conformational libraries. Compared are simulations using conformational libraries formed by the (1, 5, 20, 50) most populated cluster conformations observed in a 400 ns REMD trajectory in mcMC simulations at varying ionic strengths. first maximum of the g(r). This behavior is largely independent of the number of protein the structures in the conformational library underlying the simulation apart from a shift in the first maximum for simulations based on the smallest conformational library with 5 structures. Structural differences between distinct conformations mainly become relevant upon formation of direct contacts, hence no significant influence of the conformational library is expected for weakly associating systems. For higher ionic strengths, the first maximum of the g(r) increases, indicating the formation of larger numbers of direct contacts, and differences between simulations based on distinct conformational libraries become visible. At ionic strengths of 300 mM and 500 mM, simulations based on 5 structures result in an increased overall attraction between the proteins as indicated by an increased peak for the first maximum of the g(r). Via integration of the respective g(r), we determine the probability of finding another lysozyme protein within a distance of 40 Å of a given protein, which yields 21% and 27% at 300 mM and 500 mM ionic strength, respectively, for simulations with the 5-membered conformational library. In simulations based on conformational libraries containing 100 structures, which include the largest degree of protein flexibility, the probability of finding another protein within 40 Å decreases to 17% and 20% at ionic strengths of 300 mM and 500 mM, respectively, for otherwise identical conditions. The largest variations between simulations of distinct conformational 10

ACS Paragon Plus Environment

Page 10 of 29

Page 11 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

libraries are observed at 500 mM, where the protein-protein interactions are most attractive and the formation of direct contacts is most prevalent. Because gD- and gB-crystallin have a lower net charge at neutral pH compared to HEWL (+2e and 0, respectively), electrostatic repulsions are less relevant. Therefore, interprotein interactions are less dependent on the ionic strength and contact formation at low ionic strengths (100 mM) are more likely a priori than in HEWL solutions. However, the overall affinity between the gD- and gB-crystallin proteins is comparatively low, which contributes to the high solubility of these species. Consistent with the results of singleconformation rigid-body MC simulations of HEWL reported by Prytkova et.al.,14 the use of a single protein structure in either crystallin simulation results in rugged features in the protein-protein g(r) (Fig. 2). These suggest the presence of preferred dimeric structures with well-defined distances of protein centers, which are typically an artifact caused by the fixed side-chain conformations on the protein surface. The latter can lead to dimers with an exaggerated relative stability of specific contacts between extended sidechains due to the missing description of the conformational entropy. Reducing the impact of such artifacts is one of the main virtues of the mcMC approach. Notably, in contrast to HEWL, simulations based on conformational libraries consisting of 5 structures result in

Figure 2. Protein-protein radial distribution functions for gD- and gB-crystallin solutions at a concentration of 10 mg/ml obtained in mcMC simulations at 100 mM ionic strength with varying conformational libraries. Compared are simulations using conformational libraries formed by the (1, 5, 20, 50) most populated cluster conformations observed in 40 µs and 500 ns MD trajectories, respectively. 11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

essentially indistinguishable radial distribution functions compared to simulations based on conformational libraries with 20 or 50 structures (Fig. 2). Hence, simulations based on a much smaller number of conformations provide consistent information on the average affinity of the crystallin proteins. As described in detail in the Methods section, the transition matrix of the conformational swap trial move is generated from the relative populations of clusters extracted from single protein MD simulation trajectories, whose centroids constitute the mcMC protein structure library. Within the timescale of each MD simulation trajectory, these cluster populations essentially encode the relative stability of each protein conformation included in the library under infinite dilution conditions, i.e. in absence of protein-protein interactions. In the analyses reported in Figs. 1 and 2, the protein structures were added to the library in a ranked fashion, starting with the centroid of the most frequently observed conformational cluster in the corresponding MD simulation. Thus, increasing the size of the conformational library adds protein structures that are less likely to be encountered in dilute conditions. The probability of observing a given conformation in a non-dilute solution can be modified by protein-protein interactions, for example, due to preferential association of proteins in a given conformation. Changes in the relative stability of protein structures within the employed conformational library are sampled by the mcMC algorithm and can be quantified by relative changes of the probability to find a protein in a given

Figure 3. Dilute-limit probabilities as described by the relative cluster populations in single-protein MD simulations for conformational libraries containing 50 structures (shown as bars) compared to observed probabilities in mcMC simulations of protein solutions at 10 mg/ml (shown as colored points). 12

ACS Paragon Plus Environment

Page 12 of 29

Page 13 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

conformation relative to the dilute limit ensemble. This is shown in Fig. 3 for mcMC simulations based on conformational libraries constructed from the 50 most prevalent conformations observed in single protein MD simulations. Overall, deviations from dilutelimit probabilities are small for the given mcMC simulation conditions. Notable changes are limited to a few structures of the HEWL proteins. Changes in the relative probability of conformations of the gD- and gB-crystallin proteins are detectable within the statistical accuracy of our simulations, but largely invisible on the scale of Fig. 3. A larger impact of protein-protein interactions on the relative probability of individual structures has been observed previously for mcMC simulations of crowded HEWL solutions (169 mg/ml) in which frequent protein-protein encounters are sterically unavoidable.14 A more detailed view on the stabilization of specific protein structures upon the formation of contacts and multimeric structures is provided in Fig. 4, where we distinguish between monomeric HEWL proteins and multimers in mcMC simulations of protein solutions at a concentration of 10 mg/ml. With increasing salt concentration, the fraction of monomeric proteins decreases, while the number of dimers and trimers increases due to enhanced electrostatic screening (Fig. 4 a-c). Deviations in the relative probability 𝑝# of finding a protein in a specific conformation 𝑖 relative to the probability under dilute conditions 𝑝#< = 𝑤# , described by the cluster populations in single protein MD simulations from which the conformational libraries were obtained, can be expressed as a free energy difference (Eq. 5). Averaged over all proteins in the solution, stabilization and destabilization effects are minimal as indicated by the scale between +0.2 and -0.2 kJ/mol in the left panel of Fig. 4d. For monomeric proteins in the solution, the effect is further weakened due to the absence of any protein contacts, which could stabilize or destabilize any given conformation (middle panel in Fig. 4d). However, for proteins in multimeric states the changes in the relative stability are more significant and on the order of +/-1.0 kJ/mol (right panel in Fig. 4d). While these differences are still less than the thermal energy, they are sufficient to modulate the population of specific conformations by up to 50% relative to their population in dilute solution. The results confirm that changes in the relative stability of distinct conformations rely on the formation of direct protein contacts. With increasing ionic strength and therefore increasing propensity to form protein-protein 13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

contacts, stabilization and destabilization effects of library conformations generally increase when analyzed for all proteins in the solution (left panel in Fig. 4d). However, if we focus the analysis on proteins in multimeric states, no significant dependence of conformational stabilizations and destabilizations on the ionic strength is observed (right panel in Fig. 4d), because this pre-selection already implies the presence of direct proteinprotein contacts that are not strongly modulated by electrostatic screening effects. The general agreement between the stabilization and destabilization free energies of

Figure 4. Formation of protein aggregates and stabilization/destabilization of specific conformers due to protein-protein interactions in HEWL solutions at concentration of 10 mg/ml at varying ionic strengths. (a-c) Fraction of proteins in monomeric, dimeric and trimeric states defined by protein center-to-center distances of less than 40 Å. (d) Effect of protein-protein interactions in mcMC simulations on the relative stability of individual HEWL conformations in a library of 50 structures. Shown are changes in the relative free energy for all proteins (left), proteins monomers (middle) and proteins in multimeric states (right) as defined in (a-c). Colors indicate observations for simulations at different ionic strengths as indicated in panel (a). Error bars indicate standard deviations obtained from the analysis of 6 independent segments of the mcMC trajectories. 14

ACS Paragon Plus Environment

Page 14 of 29

Page 15 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

individual conformations in multimeric states for simulations at varying ionic strength indicates the statistical convergence of the simulations. Overall, the stabilization and destabilization of protein conformations in multimeric states can modulate the effective binding free energy and aggregation propensity of proteins. Such effects become particularly noticeable in systems already prone to form multimeric states, when stabilizations/destabilizations of specific conformations become more pronounced as indicated by Fig. 4d. Consequently, structural properties of simulated protein solutions with a high overall affinity between proteins, such as the HEWL solutions at high ionic strengths, become dependent on details of the conformational library employed to mimic protein flexibility in mcMC. On the other hand, in systems with dominantly repulsive or only weakly attractive interactions, such as for the HEWL solutions at low ionic strength and solutions of the crystallins, structural properties like the g(r) between protein centers become less dependent on details of the conformational library used to describe protein flexibility. After identifying overall conditions for which details of the conformational library, i.e., the number of protein structures in the library, are most relevant, it is important to ask how properties of protein solutions simulated with the mcMC algorithm depend on the protocol with which the conformational library was produced. The length of the MD simulation trajectory used to generate the library of structures and the fraction of the conformational space it explores is a complementary aspect to consider in addition to the library size, i.e., the number of independent structures it contains. Both factors play a key role for a

Figure 5. Protein-protein radial distribution functions from mcMC simulations of HEWL solutions with a library of 50 protein structures sampled using four different REMD trajectory lengths (from left to right: 100, 200, 300 and 400 ns). 15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sufficient description of protein flexibility in mcMC simulations. Fig. 5 shows the proteinprotein radial distribution function from simulations of HEWL at varying ionic strengths generated with a library of 50 protein structures sampled from 100 ns, 200 ns, 300 ns and the total 400 ns of the single protein REMD trajectory. As discussed previously, the amplitude of the first maximum of the g(r) increases with increasing ionic strength due to enhanced electrostatic screening of the total protein charge. However, in addition we find changes in the average protein-protein interactions as the REMD sampling time underlying the generation of the structural library is increased. The effect is most notable at high ionic strengths, when the overall aggregation propensity increases and structural details in the conformational library become more important as established earlier. Integration of the g(r) for simulations at an ionic strength of 500 mM provides the probability of finding another protein within 40 Å, which amounts to 16%, 20%, 22% and 20% for simulations based on conformational libraries obtained from 100 ns, 200 ns, 300 ns and 400 ns of the REMD trajectory, respectively. The sudden increase upon increasing the sampling time from 100 ns to 200 ns indicates the occurrence of new conformations that enhance protein-protein interactions but are not sufficiently explored within the first 100 ns. This is consistent with the conformational space of HEWL sampled on these different timescales, which is analyzed in detail in Fig. S1 and S2 of the Supporting Information (SI). In particular, increased numbers of structures with large solvent accessible surface areas (SASA) for sampling times greater than 100ns play a significant role for attractive protein-protein interactions. The latter is indicated by a correlation of the SASA with the negative stabilization free energy of individual conformations in multimeric states shown in Fig. S3 of the SI. Increasing the REMD sampling time to 300 ns and 400 ns has a smaller effect, but changes in the shape of the first peak in the g(r) and in the probability of dimer formation suggest that new structures occurring in the second half of the REMD trajectory promote the formation of slightly distinct protein dimer configurations in the simulated HEWL solutions. The formation of larger multimers (trimers etc.) plays a minor role as dimers constitute the dominant form

16

ACS Paragon Plus Environment

Page 16 of 29

Page 17 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 6. Protein-protein radial distribution functions from mcMC simulations of gD solutions with libraries of 50 protein structures sampled using four different MD trajectory lengths. of HEWL aggregates in the simulated protein solutions at 10 mg/ml (see Fig. 4), in agreement with previous simulations and experimental studies.11,49,50 Protein-protein radial distribution functions from mcMC simulations of gD-crystallin generated in mcMC simulations with a library of 50 protein structures sampled from MD trajectory lengths of 500 ns through 40 µs are essentially independent on MD sampling time (Fig. 6). This is due in part to the limited internal conformational dynamics of this protein fold, which are sufficiently sampled on sub-µs timescales. An RMSD time trace for the gD-crystallin protein (Fig. S4 of the SI) shows no significant change in the sampled conformational space for the final 40 µs of the trajectory used for the generation of conformational libraries. Further, our observations from mcMC simulations of HEWL (Fig. 1, 4 and 5) and crystallin solutions (Fig. 2 and 6) at low ionic strength suggest that simulations of weakly associating systems do not significantly depend on detailed characteristics of the conformational libraries. The shapes of the first peak in the radial distribution functions for all the systems considered here, in particular the presence of various smooth shoulders, suggest the formation of dimer structures with specific relative orientations of the bound proteins and the presence of preferred interaction sites. To assess the effect of distinct conformational libraries in mcMC simulations on the sampling of specific binding modes in the formation of protein dimers, we computed spatial distribution functions (SDF) of other proteins in 17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the coordinate system of a reference protein. SDFs represent the spatial density distribution of protein centers-of-mass around the tagged reference protein at the origin, normalized by the bulk density.14 The results are shown for simulations of HEWL at ionic strengths of 100 mM and 500 mM in Fig. 7 and for the simulations of gD-crystallin (ionic strength 100 mM) in Fig. 8. In each case, we follow the evolution of the SDF with increasing REMD or MD sampling used for the construction of conformational libraries with 50 structures. Isosurfaces for fixed local densities relative to the bulk solution are shown to indicate the shape of the SDFs. The contour value was chosen for clarity in each system by compensating for variations in the highest density observed in the first coordination shell due to differences in the overall aggregation propensity and specificity of binding to a particular site.

Figure 7. Spatial distribution functions from mcMC simulations of HEWL solutions with libraries containing 50 protein structures obtained from REMD trajectories of increasing length (from left to right: 100, 200, 300 and 400 ns) for ionic strenghs of 100 mM (top panels) and 500 mM (bottom panels). Isosurfaces are shown for 2.5 and 6.0 times the bulk density of proteins in the solution for the simulation at 100 mM and 500 mM, respectively. The protein structure with the highest statistical weight sampled from the 400 ns REMD trajectory is shown in all panels in the same orientation.

18

ACS Paragon Plus Environment

Page 18 of 29

Page 19 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 8. Spatial distribution functions from mcMC simulations of gD-crystallin solutions with libraries of 50 protein structures sampled using four different MD trajectory lengths. The isosurface contours (shown in green) correspond to 7.5 times the bulk density. The protein structure with the highest statistical weight sampled from the 40 µs MD trajectory is shown in all panels in the same orientation, with the C-terminal domain on the right side. In mcMC simulations of the HEWL solutions at both low and high ionic strength, the preferred interaction sites tend to be more populated in simulations with libraries sampled from REMD trajectories of 200 ns length and longer, which is consistent with the corresponding radial distribution functions (Fig. 5). However, although there are minor differences among the SDFs, the main modes of interaction appear to be preserved in simulations at the same ionic strength after structures from the first 200 ns of the REMD trajectory are included into the conformational library. Variations between preferred binding sites only become apparent for changes in ionic strength. A preferred binding site at an ionic strength of 100 mM (right hand side in each figure panel) becomes depopulated at ionic strengths of 500 mM. This is compensated by preferred binding in other locations in the protein environment at an ionic strength of 500 mM, which are not populated at an ionic strength of 100 mM (top left in each figure panel). This observation indicates that changes in ionic strength not only affect the overall aggregation propensity of the proteins, but also their preferred relative position (and orientation) in complexes because of modified effective electrostatic interactions. The SDF analysis for gD-crystallin reveals a dominant interaction surface on the Cterminal domain that is essentially independent of the extent of conformational sampling used to generate the protein structure library (see Fig. 8). There are two secondary 19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

interaction regions, one on either protein domain, whose extent and specific location vary among the simulations. This variability may be attributed to the accumulated effect of rare conformational excursions that occur at intervals on the order several µs (see Fig. S4 of the SI).

Summary In the characterization of protein-protein interactions using Monte Carlo simulations of many-protein systems, mcMC augments the computational efficiency of rigid-body modeling approaches with increased realism through the incorporation of a conformational swap trial move. The latter approximates the intramolecular degrees of freedom of a single protein and its interactions with the solvation shell via a finite-size library of discrete protein conformations, including information on relative stabilities of individual conformations in dilute solution. Moreover, mcMC eliminates the undesirable dependence, observed in rigid body simulations, on the initial structure chosen to initiate the simulation.14 The conformational library is generated using atomistic MD simulations in infinite dilution conditions. We demonstrated here that, in the case of weakly-interacting systems or when electrostatic repulsion dominates inter-protein interactions, the library size and the extent of conformational sampling involved in the generation of the conformational library do not play a crucial role in the description of protein-protein interactions. However, these factors do become relevant under conditions in which protein-protein interactions are increasingly attractive (e.g., in HEWL solutions at high ionic strength). We find for each studied system that a finite number of protein structures in the conformational library is sufficient to obtain consistent results, which do not change upon increasing the number of structures in the library of discrete conformations to describe protein flexibility. On the other hand, we find that insufficient conformational sampling during the generation of the library of structures can also result in variations of effective protein-protein interactions. Within the systems studied here, this effect manifests itself prominently in the underestimation of attractive protein-protein interactions in HEWL solutions simulated with conformational libraries generated from REMD trajectories less than 200 ns in length. 20

ACS Paragon Plus Environment

Page 20 of 29

Page 21 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

However, we also find that specific high-affinity binding modes do not change significantly with prolonged conformational sampling. This result is encouraging because it demonstrates that information on the most relevant modes of interaction and binding interfaces, which may lead to dimer and oligomer formation and eventual aggregation, can already be obtained with a limited description of the protein conformational flexibility. Predicted dimer structures can then be used as starting points for additional simulations with fully flexible proteins, e.g. using atomistic MD, which can then be used to study the interactions in more detail. Overall, the impact of conformational flexibility for simulations of interacting proteins is determined by several factors: 1) the specific proteins to be studied, 2) solvent conditions such as ionic strength and the resulting aggregation propensity, and 3) the physical properties that are being analyzed. To summarize the impact of conformational flexibility on overall attractive or repulsive protein-protein interactions, we provide an analysis of the osmotic second virial coefficient B22 obtained from simulations of HEWL solutions at varying ionic strengths in Fig. S5 of the SI for distinct conformational libraries. The analysis of this experimental observable14,51 demonstrates the impact of the conformational library, in particular of the library size, at elevated ionic strengths between 300 and 500 mM, which indicates quantitative changes in the aggregation propensity. However, no significant changes based on typical experimental error bars are observed at lower ionic strengths of 100 mM. In summary, this study shows that the modeling of conformational flexibility in proteins via finite-sized conformational libraries of discrete structures provides a promising route to accurate and predictive simulations of protein-protein interactions in multi-protein systems. The low computational cost, which is equivalent to rigid-body Brownian dynamics simulations, allows for exhaustive sampling of systems containing hundreds of interacting proteins, and to characterize their aggregation propensity. Within the limitations of the accuracy of the underlying interaction potentials, which need to include solvent-mediated interactions between the proteins, this technique allows for studies of protein-protein interactions in the absence and presence of crowding environments, protein aggregation, and the self-assembly of multimeric protein complexes. 21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting Information Tables with details for the individual simulations, analysis of the conformational space sampled in REMD simulations of HEWL and conventional MD of gD-crystallin, convergence of second virial coefficients of the osmotic pressure with size of the structural library and REMD sampling time for mcMC simulations of HEWL solutions.

Acknowledgements B. M. and M. H. are thankful for financial support from the Cluster of Excellence RESOLV (EXC 1069) funded by the Deutsche Forschungsgemeinschaft. This work was supported in part by the National Science Foundation (grant DMR-1410415) and the National Institutes of Health (grant R01 EY025328).

References (1)

Ellis, R. J. Macromolecular Crowding: Obvious but Underappreciated. Trends in Biochemical Sciences. 2001, pp 597–604.

(2)

Miklos, A. C.; Li, C.; Sharaf, N. G.; Pielak, G. J. Volume Exclusion and Soft Interaction Effects on Protein Stability under Crowded Conditions. Biochemistry 2010, 49 (33), 6984–6991.

(3)

Xie, X. S.; Choi, P. J.; Li, G.-W.; Lee, N. K.; Lia, G. Single-Molecule Approach to Molecular Biology in Living Bacterial Cells. Annu. Rev. Biophys. 2008, 37 (1), 417–444.

(4)

Pastor, I.; Vilaseca, E.; Madurga, S.; Garcés, J. L.; Cascante, M.; Mas, F. Diffusion of Alpha-Chymotrypsin in Solution-Crowded Media. A Fluorescence Recovery after Photobleaching Study. J. Phys. Chem. B 2010, 114 (11), 4028– 4034.

(5)

Weiss, M. Probing the Interior of Living Cells with Fluorescence Correlation Spectroscopy. Ann N Y Acad Sci 2008, 1130, 21–27. 22



ACS Paragon Plus Environment

Page 22 of 29

Page 23 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(6)

Li, C.; Wang, G.-F.; Wang, Y.; Creager-Allen, R.; Lutz, E. a; Scronce, H.; Slade, K. M.; Ruf, R. a S.; Mehl, R. a; Pielak, G. J. Protein 19F NMR in Escherichia Coli. J. Am. Chem. Soc. 2010, 132 (1), 321–327.

(7)

Senske, M.; Törk, L.; Born, B.; Havenith, M.; Herrmann, C.; Ebbinghaus, S. Protein Stabilization by Macromolecular Crowding through Enthalpy Rather Than Entropy. J. Am. Chem. Soc. 2014, 136 (25), 9036–9041.

(8)

Gnutt, D.; Gao, M.; Brylski, O.; Heyden, M.; Ebbinghaus, S. Excluded-Volume Effects in Living Cells. Angew. Chemie - Int. Ed. 2015, 54 (8), 2548–2551.

(9)

Feig, M.; Sugita, Y. Variable Interactions between Protein Crowders and Biomolecular Solutes Are Important in Understanding Cellular Crowding. J. Phys. Chem. B 2012, 116 (1), 599–605.

(10) Feig, M.; Sugita, Y. Reaching New Levels of Realism in Modeling Biological Macromolecules in Cellular Environments. J. Mol. Graph. Model. 2013, 45, 144– 156. (11) Mereghetti, P.; Gabdoulline, R. R.; Wade, R. C. Brownian Dynamics Simulation of Protein Solutions: Structural and Dynamical Properties. Biophys. J. 2010, 99 (11), 3782–3791. (12) McGuffee, S. R.; Elcock, A. H. Diffusion, Crowding & Protein Stability in a Dynamic Molecular Model of the Bacterial Cytoplasm. PLoS Comput. Biol. 2010, 6 (3), e1000694. (13) Lund, M.; Jönsson, B. A Mesoscopic Model for Protein-Protein Interactions in Solution. Biophys. J. 2003, 85 (5), 2940–2947. (14) Prytkova, V.; Heyden, M.; Khago, D.; Freites, J. A.; Butts, C. T.; Martin, R. W.; Tobias, D. J. Multi-Conformation Monte Carlo: A Method for Introducing Flexibility in Efficient Simulations of Many-Protein Systems. J. Phys. Chem. B 2016, 120 (33), 8115–8126. (15) Martinez, M.; Bruce, N. J.; Romanowska, J.; Kokh, D. B.; Ozboyaci, M.; Yu, X.; 23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Öztürk, M. A.; Richter, S.; Wade, R. C. SDA 7: A Modular and Parallel Implementation of the Simulation of Diffusional Association Software. J. Comput. Chem. 2015, 36 (21), 1631–1645. (16) Gabdoulline, R. R.; Wade, R. C. Effective Charges for Macromolecules in Solvent. J. Phys. Chem. 1996, 100 (9), 3868–3878. (17) Elcock, A. H.; Gabdoulline, R. R.; Wade, R. C.; McCammon, J. A. Computer Simulation of Protein-Protein Association Kinetics: AcetylcholinesteraseFasciculin. J. Mol. Biol. 1999, 291 (1), 149–162. (18) Jorgensen, W. L.; Tirado-Rives, J. The OPLS [Optimized Potentials for Liquid Simulations] Potential Functions for Proteins, Energy Minimizations for Crystals of Cyclic Peptides and Crambin. J. Am. Chem. Soc. 1988, 110 (6), 1657–1666. (19) Madura, J. D.; Briggs, J. M.; Wade, R. C.; Davis, M. E.; Luty, B. A.; Ilin, A.; Antosiewicz, J.; Gilson, M. K.; Bagheri, B.; Scott, L. R.; McCammon, J. A. Electrostatics and Diffusion of Molecules in Solution: Simulations with the University of Houston Brownian Dynamics Program. Comput. Phys. Commun. 1995, 91 (1–3), 57–95. (20) Bucciarelli, S.; Mahmoudi, N.; Casal-Dujat, L.; Jéhannin, M.; Jud, C.; Stradner, A. Extended Law of Corresponding States Applied to Solvent Isotope Effect on a Globular Protein. J. Phys. Chem. Lett. 2016, 7 (9), 1610–1615. (21) Metropolis, N.; Ulam, S. The Monte Carlo Method. J. Am. Stat. Assoc. 1949, 44 (247), 335. (22) Daura, X.; Gademann, K.; Jaun, B.; Seebach, D.; van Gunsteren, W. F.; Mark, A. E. Peptide Folding: When Simulation Meets Experiment. Angew. Chemie Int. Ed. 1999, 38 (1–2), 236–240. (23) Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008, 4 (3), 435–447. 24

ACS Paragon Plus Environment

Page 24 of 29

Page 25 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(24) Schwalbe, H.; Grimshaw, S. B.; Spencer, A.; Buck, M.; Boyd, J.; Dobson, C. M.; Redfield, C.; Smith, L. J. A Refined Solution Structure of Hen Lysozyme Determined Using Residual Dipolar Coupling Data. Protein Sci. 2001, 10 (4), 677–688. (25) Gordon, J. C.; Myers, J. B.; Folta, T.; Shoja, V.; Heath, L. S.; Onufriev, A. H++: A Server for Estimating pK(a)s and Adding Missing Hydrogens to Macromolecules. Nucleic Acids Res. 2005, 33 (Web Server issue), W368–W371. (26) Patriksson, A.; van der Spoel, D. A Temperature Predictor for Parallel Tempering Simulations. Phys. Chem. Chem. Phys. 2008, 10 (15), 2073–2077. (27) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79 (2), 926. (28) Miyamoto, S.; Kollman, P. A. Settle: An Analytical Version of the SHAKE and RATTLE Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13 (8), 952– 962. (29) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103 (19), 8577– 8593. (30) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81 (8), 3684–3690. (31) Nosé, S. A Molecular Dynamics Method for Simulations in the Canonical Ensemble. Mol. Phys. 1984, 52 (2), 255–268. (32) Parrinello, M.; Rahman, a. Polymorphic Transitions in Single Crystals: A New Molecular Dynamics Method. J. Appl. Phys. 1981, 52 (12), 7182–7190. (33) Basak, A.; Bateman, O.; Slingsby, C.; Pande, A.; Asherie, N.; Ogun, O.; Benedek, G. B.; Pande, J. High-Resolution X-Ray Crystal Structures of Human gD Crystallin 25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(1.25 Å) and the R58H Mutant (1.15 Å) Associated with Aculeiform Cataract. J. Mol. Biol. 2003, 328 (5), 1137–1147. (34) Jung, J.; Byeon, I. J. L.; Wang, Y.; King, J.; Gronenborn, A. M. The Structure of the Cataract-Causing P23T Mutant of Human gD-Crystallin Exhibits Distinctive Local Conformational and Dynamic Changes. Biochemistry 2009, 48 (12), 2597– 2609. (35) Kumaraswamy, V. S.; Lindley, P. F.; Slingsby, C.; Glover, I. D. An Eye Lens Protein-Water Structure: 1.2 A Resolution Structure of gammaB-Crystallin at 150 K. Acta Crystallogr. D. Biol. Crystallogr. 1996, 52 (Pt 4), 611–622. (36) Best, R. B.; Zhu, X.; Shim, J.; Lopes, P. E. M.; Mittal, J.; Feig, M.; Mackerell, A. D. Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone Φ, ψ and Side-Chain χ(1) and χ(2) Dihedral Angles. J. Chem. Theory Comput. 2012, 8 (9), 3257–3273. (37) Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol. Graph. 1996, 14 (1), 33–38. (38) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kalé, L.; Schulten, K. Scalable Molecular Dynamics with NAMD. J. Comput. Chem. 2005, 26 (16), 1781–1802. (39) Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: An N Ŋlog( N ) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98 (12), 10089–10092. (40) Grubmüller, H.; Heller, H.; Windemuth, A.; Schulten, K. Generalized Verlet Algorithm for Efficient Molecular Dynamics Simulations with Long-Range Interactions. Mol. Simul. 1991, 6 (1–3), 121–142. (41) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of N-Alkanes. J. Comput. Phys. 1977, 23 (3), 327–341. (42) Martyna, G. J.; Tobias, D. J.; Klein, M. L. Constant Pressure Molecular Dynamics 26

ACS Paragon Plus Environment

Page 26 of 29

Page 27 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Algorithms. J. Chem. Phys. 1994, 101 (5), 4177–4189. (43) Feller, S. E.; Zhang, Y.; Pastor, R. W.; Brooks, B. R. Constant Pressure Molecular Dynamics Simulation: The Langevin Piston Method. J. Chem. Phys. 1995, 103 (11), 4613–4621. (44) Shaw, D. E.; Grossman, J. P.; Bank, J. A.; Batson, B.; Butts, J. A.; Chao, J. C.; Deneroff, M. M.; Dror, R. O.; Even, A.; Fenton, C. H.; Forte, A.; Gagliardo, J.; Gill, G.; Greskamp, B.; Ho, C. R.; Ierardi, D. J.; Iserovich, L.; Kuskin, J. S.; Larson, R. H.; Layman, T.; Lee, L. S.; Lerer, A. K.; Li, C.; Killebrew, D.; Mackenzie, K. M.; Mok, S. Y. H.; Moraes, M. A.; Mueller, R.; Nociolo, L. J.; Peticolas, J. L.; Quan, T.; Ramot, D.; Salmon, J. K.; Scarpazza, D. P.; Ben Schafer, U.; Siddique, N.; Snyder, C. W.; Spengler, J.; Tang, P. T. P.; Theobald, M.; Toma, H.; Towles, B.; Vitale, B.; Wang, S. C.; Young, C. Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC; 2014; Vol. 2015–January, pp 41–53. (45) Tuckerman, M.; Berne, B. J.; Martyna, G. J. Reversible Multiple Time Scale Molecular Dynamics. J. Chem. Phys. 1992, 97 (3), 1990–2001. (46) Shan, Y.; Klepeis, J. L.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Gaussian Split Ewald: A Fast Ewald Mesh Method for Molecular Simulation. J. Chem. Phys. 2005, 122 (5). (47) Martyna, G. J.; Klein, M. L.; Tuckerman, M. Nosé–Hoover Chains: The Canonical Ensemble via Continuous Dynamics. J. Chem. Phys. 1992, 97 (4), 2635–2643. (48) Lippert, R. A.; Predescu, C.; Ierardi, D. J.; Mackenzie, K. M.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Accurate and Efficient Integration for Molecular Dynamics Simulations at Constant Temperature and Pressure. J. Chem. Phys. 2013, 139 (16), 164106. (49) Porcar, L.; Falus, P.; Chen, W.-R.; Faraone, A.; Fratini, E.; Hong, K.; Baglioni, P.; Liu, Y. Formation of the Dynamic Clusters in Concentrated Lysozyme Protein 27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Solutions. J. Phys. Chem. Lett. 2010, 1 (1), 126–129. (50) McGuffee, S. R.; Elcock, A. H. Atomically Detailed Simulations of Concentrated Protein Solutions: The Effects of Salt, pH, Point Mutations, and Protein Concentrations in Simulations of 1000-Molecule Systems. J. Am. Chem. Soc. 2006, 128, 12098–12110. (51) Velev, O. D.; Kaler, E. D.: Lenhoff, A. M. Protein Interactions in Solution Characterized by Light and Neutron Scattering: Comparison of Lysozyme and Chymotrypsinogen. Biophys. J. 1998, 75, 2682–2697.



28

ACS Paragon Plus Environment

Page 28 of 29

Page 29 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Graphical TOC Entry

29

ACS Paragon Plus Environment