OPEP6: A New Constant-pH Molecular Dynamics Simulation Scheme

May 6, 2019 - Massively parallel implementation of Steered Molecular Dynamics in Tinker-HP: comparisons of polarizable and non-polarizable simulations...
2 downloads 0 Views 661KB Size
Subscriber access provided by Stockholm University Library

Biomolecular Systems

OPEP6: A new constant-pH Molecular Dynamics simulation scheme with OPEP coarse-grained force field Fernando Luis Barroso da Silva, Fabio Sterpone, and Philippe Derreumaux J. Chem. Theory Comput., Just Accepted Manuscript • Publication Date (Web): 06 May 2019 Downloaded from http://pubs.acs.org on May 6, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

OPEP6: A new constant-pH Molecular Dynamics simulation scheme with OPEP coarse-grained force field Fernando Lu´ıs Barroso da Silva,∗,†,‡,¶ Fabio Sterpone,‡ and Philippe Derreumaux∗,§,k † Departamento de F´ısica e Qu´ımica, Faculdade de Ciˆencias Farmacˆeuticas de Ribeir˜ao Preto, Av. do caf´e, s/no. – Universidade de S˜ao Paulo, BR-14040-903 Ribeir˜ao Preto – SP, BRAZIL, Fax: +55 (16) 3315 48 80; Tel: +55 (16) 3315 42 22 ‡ Laboratoire de Biochimie The´orique, UPR 9080 CNRS, Institut de Biologie Physico Chimique, Universit´e Paris Diderot – Paris 7 et Universit´e Sorbonne Paris Cit´e, 13 rue Pierre et Marie Curie, 75005 Paris, France. ¶ Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27606, United States. § Laboratory of Theoretical Chemistry, Ton Duc Thang University, Ho Chi Minh City, Vietnam. k Faculty of Pharmacy, Ton Duc Thang University, Ho Chi Minh City, Vietnam. E-mail: [email protected]; [email protected] Abstract The great importance of pH for molecular processes has motivated the continuous development of numerical methods to improve the physical description of molecular mechanisms in computer simulations. Although rigid titration models are able to provide several useful information, the coupling between the molecular conformational

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

changes and the acid-base equilibrium is necessary to more completely model the pH effects in biomolecules. Previous reported convergence issues with atomistic simulations indicated that a promising approach would require coarse-grained models. By means of the coupling between the successful OPEP force-field for proteins with the fast proton titration scheme, we proposed a new protocol for constant-pH molecular dynamics simulations that takes the advantage of both coarse-grained approaches to circumventing sampling difficulties faced by other numerical schemes and also to be able to properly describe electrostatic and structural properties at lower cpu costs. Here, we introduce this new protocol that defines now OPEP6 and its pKa s benchmark for a set of representative proteins (HP36, BBL, HEWL, NTL9 and a protein G variant). In comparison with experimental measurements, our calculated pKa values have the average, maximum absolute and root-mean-square deviations of [0.3 − 1.1], [0.6 − 2.5] and [0.4 − 1.3] pH units, respectively, for this five studied proteins. These numbers are within the ones commonly observed when similar comparisons are done among different theoretical models and slightly better than the accuracy obtained by a rigid model using the same titration engine. For BBL, the predicted pKa are closer to experimental results than other analyzed theoretical data. Structural properties were tested for insulin where separation distances between the groups were compared and found in agreement with experimental crystallographic data obtained at different pH conditions. These indicate the ability of the new OPEP to proper describe the system physics and open up more possibilities to study pH-mediated biological processes.

KEYWORDS: protein titration, Constant-pH simulations, Tanford and Kirkwood model, coarse-grained models, protein electrostatics.

2

ACS Paragon Plus Environment

Page 2 of 43

Page 3 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Introduction The great importance of pH for biological, chemical, medical and technological systems has motivated the development of several theoretical methods to properly describe its effects. From quantum mechanical treatments to empirical descriptions, a diversity of methods is available in the literature (see refs. 1–3 for reviews). Among other things, 3 the choice of the ideal constant-pH (CpH) method depends on the characteristics of the studied system, the desired physical property to be quantified and the needed accuracy for a given cpu resource. The main classes of theoretical tools nowadays include empirical methods and Poisson-Boltzmann (PB) solvers, 4–8 Monte Carlo schemes, 9–12 Molecular Dynamics (MD) coupled with titration, 13–15 MD with the incorporation of the alternative free energy method in which the transformation between protonation/deprotonation states is treated as an additional dynamical degree of freedom in the simulation (i.e. the so-called “MD with λ dynamics”) 16–20 and ab initio MD. 21–23 Most of the numerical schemes simplified the problem invoking a rigid biomolecular description. 4,5,8–10,12 Despite the success of these methods in several biomolecular applications, 3,24–32 the physical systems were only partially described due to the fact that the strong protonation–conformation coupling was largely neglected. This constraint started to be removed by the combination of classical MD simulations with protonation numerical schemes. The pioneer work done by Baptista and co-authors 13 using a hybrid MD/PB approach started a new era of CpH simulation methods to allow completed pH-coupled MD studies. Many other labs follow either this idea or are rooted in the ”λ”’s dynamics method. 3,33 Titration plots, free energy of interactions and any other structural properties that are pH-dependent are routinely calculated by such methods with different levels of accuracy and technical difficulties. 1–3 The computer power is still a limiting factor for CpH simulations with atomistic and more sophisticated techniques. 3 Studies by different research groups have reported slow convergence properties for CpH MD methods. 18,34 Often, more computationally expensive sim3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ulations are necessary for convergence (10 ns as reported by Shen and collaborators, 35 or 40 ns for a simple dipeptide as quantified by Chen & Roux 34 ). While some researchers focus to solve these sampling issues, 18–20,34,36,37 others work on simplifying the models. 3,10,12,38,39 Coarse-grained (CG) models are always appealing to allow the simulations to cross barriers by reducing the model details to only the essential degrees of freedom of the system. Recently, working in this direction, a new version of CpH method combined a Cα structurebased model (SBM) with a simple titration scheme to study the folding of the N-terminal domain from the ribosomal protein L9 (NTL9) 38 and a protein G variant (Gprot). 39 Although such CG scheme might be useful to study electrostatic interactions during the refolding process for proteins whose structure is already available, its general application becomes questionable when the tertiary structure is unknown. Ideally, the protein force field parameters should not be specific for a given protein as it happens with G˜o-like models. 40 Moreover, crystallization conditions used to determine these parameters often have a severe ionic strength that might be unrealistic for applications of this kind of model on ordinary biological environments. Following the landmark of Baptista and co-authors 13 to better describe the strong protonation-conformation coupling for proteins, a more efficient strategy to deal with all these issues is to invoke a general CG force-field coupled with a fast titration numerical scheme. For the first, the Optimized Potential for Efficient Protein Structure Prediction (OPEP) force field that has been very successful to describe folding, aggregation, and other biophysical phenomena without any need to adjust the parameters for a specific given protein system 41–46 is a natural choice. Proteins with only the linear sequence known can be studied. Moreover, OPEP version 5 has incorporated ad hoc potentials for modeling ion-pair interactions to describe more realistically salt-bridges. 47 OPEP also makes use of an implicit solvent representation which is rather convenient for electrostatic models. For the acid-base equilibrium, the Fast Proton Titration Scheme (FPTS) 10 is able to overcome the sampling time and convergence limitations of atomistic methods with quite good accuracy for both

4

ACS Paragon Plus Environment

Page 4 of 43

Page 5 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

proteins 3,12 and nucleic acids. 48 For instance, in a correlation analysis between experimental and calculated pKa s for a set of proteins the linear correlation coefficient and slope between these pKa s for the FPTS are equal to 0.68 and 0.79, respectively, while for PropKa, 7 these numbers are slightly smaller, 0.51 and 0.60. 12 This was a significant achievement because PropKa tends to be more precise than popular PB solvers. 49 Another aspect is that the FPTS is also based on a CG representation of the system and a continuum solvent model which means that both OPEP and FPTS are in the same level of approximation for the solvent description. Therefore, as a proof of concept, we proposed a new protocol for a CpH MD simulation: by running a classical MD simulation with OPEP5 force field with periodic external calls to the FPTS, and by updating the ion-pairs interactions of the titratable aminoacids according to the protonation state returned by FPTS. This protocol is analogous to the one developed by Baptista and collaborators. 13,50 In short, the main differences are a) the replacement of an atomistic force field by the CG OPEP force field, b) the use of the FPTS instead of the linear Poisson-Boltzmann solver (see refs. 3,12 for a discussion regarding the pros and cons), c) solvent is here treated in equally bases during both the MD sampling and the titration. Doing that, the strong protonation-conformation coupling is completely taken into account during the MD run with a very small impact on the cpu performance. Consequentially, the dynamical changes on the local protein environment (e.g. the interaction with other charged groups) now do affect the protonation/deprotonation process and vice-versa without convergence problems. This gives more realism to the model because pH is intimately related to biomolecular charge and, therefore, it should control both intra and inter biomolecular interactions which affect the protein conformation. Furthermore, the important protein charge fluctuation mechanism that is at the origin of the attractive mesoscopic forces between macromolecules in solution (as predicted by the Kirkwood-Shumaker theory 51 ) is incorporated in a complete manner in this protocol. Such more realistic approach might also play an important role for applications in biomolecular complexation studies where this “charge regulation mechanism” is fundamental to explain the experimental observations

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 43

particularly at low salt and at pH regimes closer to pI (the isoelectric point). 26,52–54 Therefore, this work is an important step forward to allow a more realistic physical description of challenging systems as the pH effect on the formation of amyloid fibrils

55–57

and any

other aggregation process (including new recognized diseases linked processes as observed for preeclampsia 58 and amyotrophic lateral sclerosis 59 ) where a large number of titratable groups are present. Other CpH molecular simulation methods are either still prohibitive in this context even with massive computational high performance resources due to their slow convergence characteristics 3,34,35 or dependent on previous availability of the protein tertiary structure which vastly limit their applicability. This paper is organized as follow. We first introduce the proposed modifications on OPEP5 and the hybrid OPEP/FPTS protocol, which is then followed by a presentation of the main convergence features and the pKa benchmarking with the other theoretical methods and the experimental data. The proteins [the thermostable actin binding 36-residue subdomain of chicken villin headpiece – HP36, the 45-residue binding domain of 2-oxoglutarate dehydrogenase multi-enzyme complex – BBL, the 129-residue hen egg white triclinic lysozyme – HEWL, NTL9 and a mutant Gprot] used in this study were selected primarily to allow the present calculations to be compared to results obtained by other theoretical methods including our own previous work with the original FPTS (using a rigid protein structure). 10 We are going to refer to this previous method as FPTSrigid (no coupling with the dynamical protein conformational changes) from now on. For the new hybrid OPEP/FPTS protocol, we will name hereon OPEP6. Insulin – INS was also investigated to let us access the structural features of the new protocol by the comparison of the theoretical conformations at different solution pHs with crystallographic data at the same conditions.

6

ACS Paragon Plus Environment

Page 7 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Model and Methodology Both the OPEP force field functional and its parameters for several versions have been well described in the literature. 41–47 The FPTS was also portrayed in details in refs. 3,10,12,48 Here, we briefly review their main ideas for better understanding of the new OPEP6 based on the combination of the modified OPEP5 force field with FPTS.

The OPEP5 force field The CG model OPEP represents each amino acid by an atomistic description of the backbone (N, HN, Cα , C, and O atoms), while one bead is used for each side chain. An exception is proline defined by all its heavy atoms. The OPEP force field is expressed as a sum of local, non-bonded, and hydrogen bonding terms. The local terms describing bond stretching, angle bending, and proper and improper dihedral torsions use standard functional forms as in atomistic force fields. The non-bonded terms describe attractive and repulsive interactions between the center of forces. OPEP5 is an implicit water model, free of explicit electrostatic Coulombic interactions. Two-body and four-body H-bond potentials are added however to mimic the backbone secondary structure propensity for alpha helices and beta-sheets. In the version 5 of the force field specific salt-bridge potentials between ionic side chains (Arg and Lys with Glu and Asp) (V IP ) replace the standard Lennard-Jones like potentials for those pairs (V LJ ). A complete description of the force field is given in refs. 45 and. 47

The fast proton titration scheme The FPTS 10 is rooted in the classical Tanford-Kirkwood (TK) model 60,61 together with a phenomenological description of pH (i.e. pH becomes a simple input parameter in this approach). It also invokes a modified Debye-H¨ uckel screening length (κc ) as suggested by Beresford-Smith and coworkers. 62 This interpretation of κc as a scaling parameter for the Coulomb shielding describes better electrostatic interactions in the protein system. 63 As

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 43

a consequence, it became dependent on the net charge of the protein and therefore on the protonation state of the aminoacids. A detailed physical chemical formalism for this titration scheme and its possible weaknesses have been given in ref. 12 The core of the titration process is modeled by 10

∆wF P T S = ∆wT K + ∆GpH

(1)

with N

wT K

p Zp2 κc i e2 h X zi zj − = 4π0 s i>j rij 2(1 + κc b)

(2)

and

∆GpH = λ

ln10 (pH − pKa) β

(3)

where e is the elementary charge (e = 1.602 × 10−19 C), 0 is the the vacuum permittivity (0 = 8.854×10−12 C 2 /N m2 ), s is the solvent dielectric constant, Np is the number of protein PN ionizable sites with valency zi , Zp = i p zi is the total protein net charge, β = 1/KB T [KB (= 1.3807 × 10−23 J.mol−1 .K −1 ) is the Boltzmann constant and T is the temperature (in Kelvin)], λ equals either −1 (protonation) or +1 (deprotonation) and b is assumed to be equal to the radius of a sphere that inscribes the protein (Rs ). 10 The intrinsic pKa values are taken from experimental data for ”isolated“ amino acids at a given experimental condition. We will refer to this equilibrium constant of the amino acid model compounds as pK0 and their values were taken from the experimental work done by Nozaki & Tanford. 64 In comparison with OPEP, the FPTS involves a higher level of granularity to model the protein where the whole amino acid residues are represented by individual charged van der Waals particles of radii (Ri ) and valences zi . This mesoscopic description of the amino acids as single beads reduces the possibilities to capture differences between the amino acid side-chain conformations. Rotamers tend to have similar coordinates when the amino acids 8

ACS Paragon Plus Environment

Page 9 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

are reduced to a single spherical object. 12 The drawback is that molecular configurations with small conformational differences will end up with an identical structural mapping in the single bead description and, as a consequence, with the same electrostatic behavior. Conversely, it is beneficial to the coupling with the MD engine decreasing the necessary frequency of external calls and requiring much less cpu time. This is another advantage of this CpH method to avoid sampling problems as observed in more detailed models. Values for Ri were taken from ref. 28 All values of the valences zi for titratable groups [Glutamic acid (GLU), aspartic acid (ASP), tyrosine (TYR), cysteine (CYS) not involved in SS bridges, lysine (LYS), histidine (HIS), argine (ARG) and the C (CTR) and N (NTR)] are defined as a function of the solution pH and can vary during the simulation. Valences vary between −1 and 0 or 0 and +1, for acid and basic amino acids, respectively. See the correspondent acceptance criterion in the next subsection, where it is explained how the protonation states are allowed to change according to the solution pH, salt conditions and the presence of other charged particles. Unless differently stated, all other beads are assigned with a constant zero charge. Although this feature was not explored in this present work, the method gives the possibility both to introduce fixed charges for any heteroatom and even to lock the protonation state for specific residues during all the run in a particular state, if justified by experimental motivations, or if it is interesting to investigate its electrostatic coupling with other titratable groups.

OPEP6: A coupled modified OPEP5 force field with proton titration The coupling of OPEP5 force field with the FPTS follows the idea of Baptista and coauthors 13,50 to combine a MD engine with a titration predictor. In their case, the GROMOS96 force field was used for the protein conformational sampling while the linear PB solver MEAD 4 was employed for the electrostatic calculations. Here, we made use of the OPEP5 force field and the FPTS, i.e. the MD engine explores the configuration space using 9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

D a modified OPEP5, and passes the coordinates of all protein beads ({rM }) to the FPTS at k

fixed time intervals during the MD run. In its turn, the FPTS will calculate all the charges of the titratable groups for this specific molecular configuration at the studied physical chemical environmental, and return to the MD engine the new average valences ({< z >Fk P T S }). A scheme of the simulation protocol can be seen in Fig. 1. The structure of the ion-pairs of OPEP5 simplifies the hybrid OPEP/FPTS implementation without the need to reparametrize the protein force field. These {< z >Fk P T S } values are equivalent to protonation fractions and should control the ad hoc ion-pair interactions. The idea behind it is quite simple: when an ionizable amino acid becomes neutral due to the solution pH, it can not be involved in an ion-pair. In this case, this specific ion-pair interaction should vanish and not contribute with the total interaction energy. Otherwise, it is used as in the original OPEP5. {| < z >Fk P T S |}’s are real numbers from 0 to 1. Therefore, a probabilistic protonation state assignment is used to define the the probability for the ion-pairs interaction terms to be taken into account or assumed to be zero. This is dynamically changed during the MD run and controlled by the FPTS. For acid amino acids, the ion-pairs interactions are included in the effective Hamiltonian when they are deprotonated (i.e. negatively charged). Conversely, for basic amino acids, only protonated ones (i.e. positively charged) will have the ion-pairs interaction term contributing with the total system interactions. In the original OPEP5 formulation, only ARG, LYS, ASP and GLU were considered for ion-pairs. Following a conservative strategy with the compromise to only introduce the real fundamental necessary modifications on the force field, we have now introduced for OPEP6 an estimated additional ad hoc term for the HIS+ ...GLU− and HIS+ ...ASP− pairs. As a first approximation within the framework of a proof of principle of a computationally efficient strategy to pH effects on biological macromolecules, we have used the same functions available for LYS. This was defined based on the similaries between the potentials of mean force for the LYS+ ...GLU− (collinear approach) and HIS+ ...GLU− (single protonated HIS)

10

ACS Paragon Plus Environment

Page 10 of 43

Page 11 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

obtained by MD simulations with the polar hydrogen CHARMM19 force field. 65 This approximation can be replaced on the future by more realistic effective pair potentials obtained at different salt concentrations in order to eventually improve the results. An explicit repulsive contribution term for like-charged residues (e.g. LYS+ -LYS+ , LYS+ -ARG+ , GLU− -GLU− , etc.) that do not form ion-pairs (interaction term explicitly absent in the original OPEP5) was also now added in the ad hoc ion-pairs functional by means of a salt-dependent screening coulombic function:

Urepulsive = 332

zi zj e−κc rij s rij

(4)

This additional new repulsive term in OPEP6 is useful to contribute to destabilize folded/stable conformations at solution pHs apart from the pI letting the chain to expand. It partially captures the expected physical behavior at these conditions. Since the introduction of this repulsive term expanded the idea of ion-pairs from OPEP5 to a more general ion-ion interactions, we will call them from now on ”ion-ion“ terms. For the protonation/deprotonation process in the context of the new OPEP6, Eq. 1 is converted into the following Metropolis Monte Carlo (MC) protocol: 10

D 1. Before the first titration step during the MD run, the CG protein structure {rM } is k

provided by the MD engine with the modified OPEP5 force field using the additional ad hoc terms above described and converted into the FPTS single bead description. This protein conformation is kept fixed during all the MC steps. This implies that during the MC run there will be no change in the protein configuration. For the sake of simplicity, bond lengths, angles and dihedral angles are kept fixed. 2. A titratable site is chosen by random. 3. If protonated, we try to move the proton charge to the bulk solution (deprotonation process). If deprotonated, we try to move the proton charge from the bulk solution to the site (protonation process). κc should be updated to reflect the corresponding changes in the 11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 43

number of counter-ions in the electrolyte solution (see refs. 3,10 ). In an analogous manner, Zp is also updated for the appropriate protein net charge after the protonation/deprotonation process. 4. This trial charge movement is accepted with the probability

  min 1, e(−β∆wF P T S )

5. The process is repeated from step 2 until the convergence is reached. Typically 105 steps are more than enough for convergence even for systems with multiple titratable sites present in the protein structure, as shown in previous studies. 3,48 6. After convergence of the FPTS, the new average partial charges for all amino acids {< z >Fk P T S } are passed back to the MD engine where they are converted into probabilities, defining if a particular group should be protonated or not. Consequentially, the ion-ion interactions (ion-pairs contributions plus the repulsive term given by Eq. 4 ) are specified as “on” or “off” for the MD engine depending on the probabilistic protonation state assignment. In mathematical terms, the correspondent change in the OPEP Hamiltonian is only at the non-local contributions related with the ion-pairs:

Vnon−local (i, j) =

     

V IP

, if i and j are protonated with opposite charges

Urepulsive ,     

V LJ

if i and j are protonated with like-charges

,

,

(5)

otherwise

where V IP indicate the ad-hoc potential added in the version 5 of the force field to describe ion-pairing between opposite charged side-chains and derived from atomistic simulations via the iterative Boltzmann inversion procedure, and V LJ indicate the non specific LennardJones like potential parametrized in the version 4 of the force field. The ionization state of each titratable group ({xFk P T S }) is recorded for further analyses. They are used to compute 12

ACS Paragon Plus Environment

Page 13 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the average protonate state of the amino acids ({< x >Fk P T S }) from the coupled modified OPEP5/FTPS simulation at any given pH. Subsequently, they are converted to the average partial charge for each ionizable group (< zaa >) as a function of pH.

7. The MD engine continues to run with the updated protonation state until a new external call to the FPTS is going to be made. Note that other physical chemical properties such as the total protein charge, the protein charge fluctuation parameter (also known as protein charge capacitance) and the total dipole moment could also be directly obtained. 10 If the MD engine always (hypothetically) produces an identical protein conformation, all these average properties will be the same as of the ones obtained by the FPTSrigid after a statistically significant events of external calls to the FPTS using OPEP6. Here, we will focus on the average charge of each amino acid (< qaa >=< zaa > e) and molecular structural properties as the separation distance between amino acids. A set of these charges obtained at several pH conditions gives a theoretical titration plot. The pKa value for each individual titratable amino acid at a particular micro-environment (specified by the neighbor charges and salt conditions) was calculated as the pH condition where this residue is half-protonated (e.g. < zi (pH) >= +0.5, for HIS). This corresponds to the solution pH where the titratable site is 50% occupied by the proton. In theory, it is necessary to repeat the simulations many times varying solution pH from 1 to 14 using small intervals calculating at each pH the average charge numbers for all titratable residues ({< zi (pH) >}). 3 Alternatively, this procedure can be simplified at the price of loosing accuracy by means of solely considering a couple of points (e.g. {< zi (pH) >} at pH 5, 6, 6.5, 7, and 8 as reported in ref. 15 ) and fitting them to the Hill equation. The drawback of this approximation appears when the pKa s are outside the used interval of pH (e.g. pKa = 3.2 for a interval of pH 5–8). For the sake of caution, we adopt here a more conservative approach considering all points from 3 to 11 using 0.1 pH unit of interval without any

13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

fitting procedure. This interval was similar with the one adopted before by us 3,12,48 which allow for a comparison between FPTSrigid and OPEP6 on equal basis. To save cpu time, we have reduced here the pH window from 1–14 to 3–11 because we knew in advance the experimental pKa s. Therefore, we could focus the simulations on the proper pH window reducing the needed number of runs at different solution pH. In some cases, when only acid amino acids were analyzed, we have narrowed down this pH range from 2.5 to 6 because the experimental pKa s were known to be from 3 to 5. In all cases, the interval was 0.1 pH units.

Proteins Proteins were mainly selected based on the availability of both experimental and theoretical pKa data. On the top of that, another used criterion was the diversity on the numbers of both the titratable sites (ntitra ) and the theoretical tools previously employed. This also allowed us to compare them with OPEP6. Obeying these criteria, we have used in pKa benchmarks HP36 (PDB id 1VII, ntitra = 10), BBL (PDB id 1W4H, ntitra = 15) and HEWL (PDB id 2LZT, ntitra = 27). They have been studied before by the FPTSrigid , 12 the generalized Born (GB) implicit-solvent approach 16,66 and the all-atom CpH Molecular Dynamics (CpHMD) simulations with the pH replica-exchange protocol (REX-CpHMD). 35 Another group of studied proteins was composed by NTL9 (PDB id 1CQU, ntitra = 18) and the Gprot variant (Gprot-QDD) with mutations T2Q, N8D and N37D (PDB id 1PGB was used to build the mutant – see bellow, ntitra = 18). These two proteins were studied before by other authors using G˜o-like models 38,39 so including them we can cover a larger set of pKa results from different theoretical tools and a diversity of modeling levels. For the structural analysis of the protein configurations produced by OPEP6 trajectories, we have employed INS (PDB id 1BPH, ntitra = 12). The conformational changes in this protein were experimentally explored in the pH range 7 − 11 67 which allows a comparison of its main characteristics. All these six proteins coordinates were obtained from the RCSB Protein Data Bank (PDB). 68 They were submitted to the OPEP files generator server 69 that converted 14

ACS Paragon Plus Environment

Page 14 of 43

Page 15 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 1: The simulation work-flow scheme used for OPEP6 coupling protein conformation with a titration scheme. N and M steps are typically equal to 5.103 (with a 1.5 fs time step) and 105 steps for the MD (with the modified OPEP5 where both the ad hoc ion-pairs for HIS+ ...GLU− and HIS+ ...ASP− and the repulsive term among like-charged residues were added) and the MC (used in the FPTS), respectively. See the text for more details. them into the OPEP force field description and provided additional parameters files for the MD engine. When the experimental structure from the PDB was obtained by NMR, we used the first low-energy NMR solution structure (model 1) as the initial experimental conformation to start our simulations. For protein structures determined by crystallographic, when atoms or a residues with alternate sites appeared in the structure, the coordinates with 15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the alternate location indicator ”A“ were selected. The mutant Gprot-QDD was prepared with ”UCSF Chimera 1.11.2” replacing the amino acids T2, N8 and N37 to correspond to Q, D and D, respectively (T2Q, N8D and N37D). Chimera default parameters were used together with 1, 000 steps of the steepest descent minimization algorithm. 70

Simulations details We have used the same simulation parameters for all protein systems except the salt concentration that was specific for each of them to mimic the experimental conditions. pH was also specific for each simulation run. The MD runs were performed with the modified OPEP5 force field as described above in a spherical box (radius 120 ˚ A) at 298 K with a time step of 1.5 fs using the RATTLE algorithm and the Berendsen thermostat. 44,47 After an initial thermalization without external calls to the titration scheme, the simulation with the MD coupled with the FPTS was initiated. Production runs with not less than 10 ns were carried out using OPEP6. The simulation length was defined accordingly with the averaged charge convergence. At least three independent runs (replicas) were performed with each protein system using a different seed for the initial velocity distribution. This was useful to avoid the biases of a particular protein conformation being trapped in a local minimum during a MD run without the need of enhanced sampling techniques. Typically, the number of MD steps between two external calls to the FPTS (N ) was equal to 5.103 steps for the 1.5fs time step, while the number of MC steps in each FPTS call (M ) was 105 steps. Separation distance measurements and the root-mean-square deviation (RMSD) analysis were done using VMD package (version 1.9.1). 71 The FPTS was solved using standard Metropolis Monte Carlo (MC) simulations. 72,73 The aqueous solution dielectric constant was fixed at  = 78.7 (for T = 298K). Solution pH was defined together with all input parameters of the MD engine, and kept constant. The salt concentration (cs ) was given by the corresponding experimental data: (a) HP36, 150 mM, (b) BBL, 200 mM, (c) HEWL, 50mM, (d) NTL9, 100 mM, (e) Gprot-QDD, 100 16

ACS Paragon Plus Environment

Page 16 of 43

Page 17 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

mM, (f) INS, 100 mM. Calculations were performed with a modified version of the Faunus biomolecular simulation package, 74 where the FPTS was already implemented. 10 Additional calculations were performed with the FPTSrigid (with no coupling between MD and titration) when data was not available for a given protein system. We used the same simulation details as reported before for them. 12 Since PropKa is a reference for pKa benchmarks, we also carried out calculations with PropKa (version 3.1) and default parameters 7 for all protein systems. Available published data obtained by other well-known theoretical methods were always reported. Tyrosine’s titration is not often included in other methods. 38,39 Therefore, we performed some calculations assuming a non-ionizable TYR in order to evaluate its effects. These simulations with a constant neutral TYR during all the simulation runs are referred as TY0 in this manuscript. All calculations performed with OPEP6 were also done with TY0. The ”null model“ data was not included in our outcomes due to the fact that predictions obtained by the FPTS were already compared to it and in the large majority of the studied protein and RNA cases were found more accurate. 3,12,48 For tests purposes, additional MD simulation runs with a decoupled titration-protein conformational simulation were also performed in some systems to evaluate the impact of the coupling. In these simulations [labeled FPTSrigid (MD)], we repeated identical runs as those done by the OPEP6 with the difference that the protonation states did not affect the ion-pairs contributions and as such the MD sampling with OPEP5 force field. Protein conformations produced by these runs were solely generated by OPEP5 without any effect of the titration scheme. On the other hand, the titration could explore a set of conformations given by the snapshots of the MD simulation at the same time intervals used in the OPEP6 calculations. This procedure employed here for tests is analogous with the one used before by us where the MD was applied as a simple mean to generate an ensemble of protein structures to compute physical chemical properties. 32,75

17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Results and discussion Before we start presenting the results of the pKa benchmarks for the studied protein systems, we show representative examples of titration plots for the amino acids (Fig. 2) and the convergence of the averaged protonation states as a function of the simulation time (Fig. 3). In Fig. 2a, a characteristic titration plot observed in the simulations where a good agreement with the experimental data is obtained is given for BBL-ASP145. The predicted pKa by OPEP6 is 3.64(19) for this amino acid while the experimental data is 3.65(4) (see table 2). For the sake of comparison, a titration plot from a FPTSrigid simulation (predicted pKa = 3.7) is included in the same figure. In general, the OPEP6 data follows the results from the rigid model. The maximum deviations between the two titration schemes for specific pH values is ≈ 0.3 pH units although the difference in the predicted pKa is less than 0.1. The spread of points seen in the averaged charges obtained by the different OPEP6 independent runs as a function of pH indicates the diversity of protein conformations present in the MD trajectory and that the titration scheme captures and explores these differences. Conversely, we report in Fig. 2b a less successful prediction where larger deviations can be seen between the experimental pKa and the theoretical one. In this case (BBL-GLU141), the experimental result is 4.46 35 while the predicted values by OPEP6 and FPTSrigid are 3.87(2) and 3.8, respectively. There is a marginal improvement but the difference between the theoretical and experimental results remains keeping the same trend found in the FPTSrigid runs. Moreover, the < z >OP EP 6 follows a different sigmoid plot (Hill equation) for this amino acid. Consequentially, it reflects a different Hill coefficient and degree of cooperativity between the sites. The coupling conformation-protonation has introduced pH effects on the the structural properties of the system at different pH values. We have observed cases like the ones shown in these figures in the simulations with the other protein systems. We next illustrate in Fig. 3 a typical example of the convergence of the averaged protonation states as a function of the simulation time. In this example, the results are for HP36 using OPEP6. This data in terms of < x > is equivalent to averaged valences (charge 18

ACS Paragon Plus Environment

Page 18 of 43

Page 19 of 43

ASP145

GLU141 0

-0.2

-0.2

-0.4

-0.4

0



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

-0.6

-0.6

-0.8

-0.8

-1

-1

5 pH

5 pH

Figure 2: Titration plots for amino acids. Averaged charges (< z >OP EP 6 ) as a function of the solution pH. Data for BBL at 200 mM NaCl from five independent CpH MD runs with OPEP6. The initial protein configuration was the first NMR structure available in the PDB file (PDB id 1W4H). Filled circles are the data obtained by OPEP6. Each color represents one different simulation replica. Data from the FPTSrigid simulations with no coupling between titration and protein conformations are given by the black dashed lines. 12 The experimental pKa values are included in these plots and represented by a single bold ”X”. 35 The experimental error bars are quite small (0.04) and therefore are not given in these plots (see also table 1). a) Left: ASP145 [pKa (OPEP6)=3.64(19)]. b) Right: GLU141 [pKa (OPEP6)=3.87(2)]. numbers) which can be easily obtained for acid amino acids simply subtracting one unit from the protonation states (for GLU and ASP, < z > = < x > − 1). In Fig. 3a, the data is given for all amino acids at pH 4.5. This is the pH that corresponds to the predicted pKa for GLU45 at this salt concentration (see table 1). As such, this is the only case where < x > is converging to 0.5. We remember that the pKa of a titratable residue at a given microenvironment is defined in a physical chemical description as the pH where this amino acid is half-protonated (i.e. < x > = 0.5). All the other ionizable residues have their protonation states converged to a different number due to the differences in their pKa s. It can be seen that for most of cases the protonation states (or the averaged charge numbers) converge to a plateau after a few ns (less than 4 ns for ASP46 and GLU72) at this environmental condition. Some titratable groups seem to need a bit longer runs at this solution pH. This

19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 43

Table 1: Calculated and experimental pKa values of HP36. Salt concentration is 150 mM. a Data taken from ref. 35 b Sampling was based on 1 ns. c The 10ns trajectory was broken in two halves (0-5 and 5-10 ns). d Data taken from ref. 12 e The initial 0-5 from the 24 ns trajectories is also given for the comparison with the all-atom REX-CpHMD simulations. The data for OPEP6 is the average from 10 independent 24 ns trajectories. Estimated standard deviations were obtained from the variation the average of the pKa values from this set of independent runs and are given between parenthesis. MAX, AAD and RMSE mean the maximum absolute, the average absolute and the root-mean-square (RMS) errors on pKa values, respectively, in this table and the next ones. Residue Asp44 Glu45 Asp46 Glu72

Experimenta 3.10(1) 3.95(1) 3.45(12) 4.37(3) MAX AAD RMSE

GBa 0-1b 3.2(1) 3.5(1) 3.5(1) 3.5(1) 0.9 0.4 0.5

All-atom REX-CpHMDa,c 0-5 5-10 0-10 2.0 3.0 2.6(5) 4.3 4.5 4.4(1) 2.4 3.7 3.1(6) 4.4 4.4 4.4(0) 1.1 0.6 0.5 0.6 0.2 0.3 0.8 0.3 0.4

propKad

FPTSrigid d

3.78 4.57 4.08 4.43 0.7 0.5 0.6

3.7 4.5 3.8 3.5 0.9 0.6 0.6

OPEP6e 0-5 0-24 3.81(8) 3.68(26) 4.47(35) 4.51(14) 3.74(26) 3.69(15) 3.71(8) 3.76(2) 0.7 0.6 0.5 0.5 0.6 0.5

is the case of GLU45 that required 14 ns at pH 4.5. In Fig. 3b, the data for each amino acid was obtained in a different pH (3.7, 3.8 and 4.5) accordingly to their predicted pKa s (see table 1). In this case, they all converged to < x > ≈ 0.5. We noted that the convergence took longer simulation time at this solution pH even for titratable residues that had a faster convergence when the simulation ran at pH 4.5. This is an indication that OPEP6 convergence depends both on the amino acid and the studied physical chemical conditions. Similar results are observed for the others simulation replicas in terms of the convergence time that < x > needs to reach a plateau. Depending on the independent run, the converged < x > value can be slightly different than 0.5. This spread of the values for < x > ≈ 0.5 let us estimated the errors in the predicted pKa s. In the next subsections, we report the results of the pKa benchmarks for the studied protein systems. These outcomes are organized as a function of the available data found in the literature for these protein systems. After, we discuss the structural features of OPEP6 generated configurations at different pH tested with insulin (INS). All the comparisons here are aimed to validate the new OPEP6 to properly describe the pH-controlled protein pro20

ACS Paragon Plus Environment

Page 21 of 43

Protonation states

Protonation states

pH 4.5

pH ≈ pKa

0.5

1 Asp44 Glu45 Asp46 Glu72

0.4

Asp44 (pH 3.8) Glu45 (pH 4.5) Asp46 (pH 3.7) Glu72 (pH 3.8)

0.8

0.6

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

0.2

0.4

0.1

0.2

0 0

4

8 12 16 simulation time (ns)

20

0 0

24

4

8 12 16 simulation time (ns)

20

24

Figure 3: Numerical convergence of the average protonation states for four titratable amino acids of HP36 (ASP44, GLU45, ASP46 and GLU72) as a function of the simulation time. Data for HP36 at 150 mM NaCl from a single independent CpH MD run with OPEP6. The initial protein configuration was the first NMR structure available in the PDB file (PDB id 1VII). a) Left: Simulations at pH 4.5 which corresponds to the experimental pKa of GLU45 in this example. b) Right: Each curve was obtained at the pH correspondingly to roughly the predicted pKa of the titratable groups. cesses. Before the conclusions, we comment about the cpu performance of this new protocol in comparison with its previous version for all studied system.

pKa calculation for HP36 Table 1 lists the pKa results obtained by OPEP6 and other theoretical methods (GB, REXCpHMD, PropKa and the FPTSrigid ) together with the experimental data. Excluding the initial 5 ns of the all-atom simulation with REX-CpHMD, all the theoretical tools give similar RMSE values (0.4 − 0.6). The results from OPEP6 are slightly better than the previous rigid titration model (FPTSrigid ) which confirms that the coupling between the conformational changes and titration slightly improves the physical description of the system. The titration behavior of Asp44, Asp46 and Glu72 became closer to the experimental data while Glu45 remains essentially the same. All the three descriptors improved for OPEP6 for the longer trajectory sampling (from 0 − 24 ns): MAX, AAD and RMSE decrease from 0.9 to 0.6, 0.6

21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

to 0.5 and 0.6 to 0.5, respectively. Even for the short trajectory sampling (from 0−5 ns), the MAX and AAD were already slightly improved. Moreover, a real improvement can be seen if the results are compared with calculations using protein structures generated by standard MD simulations where no coupling titration-protein configuration is employed. For instance, if the calculations were done using the FPTSrigid with the input protein conformation taken as the most populated clustered protein conformation from a cluster analysis of a non-CpH MD trajectory (obtained with OPEP not coupled with a titration scheme), the RMSE is as high as 1.2 indicating that the coupling contributes to approach the generated protein configurations to the real ones. The OPEP6 achieved similar performance of GB and PropKa in terms of the RMSE values and a smaller maximum deviation. Longer simulations tend to increase the accuracy of OPEP6 as it does for the other protocols where MD is coupled with a titration scheme. Both the results for OPEP6 and the all-atom REX-CpHMD methods were less accurate for the initial part of the MD trajectories (from 0−5 ns). This can be easily seen analyzing the results already discussed for the convergence of the averaged protonation states (see Fig. 3). It is necessary to have a number of statistically significant events of protonation/deprotonation to obtain reasonable charge numbers. The conformational rearrangements involving changes in the proteins protonation states requires longer runs for convergence. We can also note that the estimated errors are larger for OPEP6 in comparison with FPTSrigid which as well indicates the diversity of the protein conformations explored during the titration by OPEP6.

pKa calculation for BBL The 45-residue binding domain of 2-oxoglutarate de- hydrogenase multi-enzyme complex was the protein system where the smallest MAX, AAD and RMDS values were achieved with the FPTSrigid . Experimental pKa s and the corresponding values calculated by the GB, all-atom REX-CpHMD, PropKa, FPTSrigid and OPEP6 methods are compared in table 2. Results obtained by OPEP6 using the full trajectory during 30ns of sampling are similar to FPTSrigid 22

ACS Paragon Plus Environment

Page 22 of 43

Page 23 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 2: Calculated and experimental pKa values of BBL. Salt concentration is 200 mM. a Data taken from ref. 35 b Sampling was based on 1 ns. c The 10ns trajectory was broken in two halves (0-5 and 5-10 ns). d Data taken from ref. 12 e The initial 0-5 from the 30 ns trajectories is also given for the comparison with the all-atom REX-CpHMD simulations. The data for OPEP6 is the average from 5 independent 30 ns trajectories. Estimated standard deviations were obtained from the variation the average of the pKa values from this set of independent runs and are given between parenthesis. Residue BBL Asp129 Glu141 His142 Asp145 Glu161 Asp162 Glu164 His166

Experimenta

3.88(2) 4.46(4) 6.47(4) 3.65(4) 3.72(5) 3.18(4) 4.50(3) 5.39(2) MAX AAD RMSE

GBa 0-1b

All-atom REX-CpHMDa,c 0-5 5-10 0-10

3.2(0) 4.3(0) 7.1(0) 2.8(2) 3.6(3) 3.4(3) 4.5(1) 5.4(1) 0.9 0.3 0.5

2.2 4.0 5.9 3.0 4.2 2.9 5.7 4.4 1.7 0.8 0.9

3.2 4.4 5.8 3.1 3.9 3.5 4.6 4.4 1.0 0.4 0.5

2.7(5) 4.2(2) 5.8(0) 3.1(0) 4.0(2) 3.2(3) 5.2(6) 4.4(0) 1.2 0.6 0.7

propKad

FPTSrigid d

3.68 4.51 6.37 3.76 4.59 2.32 4.54 5.78 0.9 0.3 0.5

3.7 3.8 6.5 3.7 4.1 3.2 3.8 6.0 0.7 0.3 0.4

OPEP6e 0-5 0-30 3.70(8) 3.86(3) 6.10(5) 3.74(5) 4.17(3) 3.31(6) 3.83(4) 6.46(5) 1.1 0.4 0.5

3.67(5) 3.87(2) 6.25(12) 3.64(19) 4.14(3) 3.28(8) 4.07(22) 6.14(26) 0.8 0.3 0.4

with MAX, AAD and RMSE equal to 0.8, 0.3 and 0.4, respectively. We believe this behavior observed for BBL and other proteins is due to the coarse-grained model used in the titration scheme that is not so sensitive for small protein conformational changes. 48 Also the balance between the attractive and specially the repulsive contribution approximations should be revised to improve the OPEP6 accuracy. A perfectly agreement was found for ASP145 as already seen in Fig. 2a. HIS166 seems to be a more difficult case to be reproduced even when the dynamics is coupled with the titration scheme. The discrepancy between the predicted pKa and its experimental value increased for this amino acid from 6.0 to 6.14(26) when comparing the rigid model with OPEP6 (the experimental pKa is 5.39(2)). A larger diversity of protein conformations was sampled as evidenced by the estimated error (0.22−0.26). Both HIS166 and a neighbor amino acid (GLU164) have higher estimated errors probably due to an electrostatic coupling in their protonation mechanisms. Something similar is also seen for the pair HIS142-ASP145 (estimated error ≈ 0.12 − 0.19). During the first 5 ns of simulation,

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 43

Table 3: Calculated and experimental pKa values of HEWL. Salt concentrations is 50mM. b Sampling was based on 1 ns. c The 10ns trajectory was broken in two halves (0-5 and 5-10 ns). d Data taken from the Supplementary material of ref. 39 e Data taken from ref. 12 f The initial 0-5 from the 30 ns trajectories is also given for the comparison with the all-atom REX-CpHMD simulations. The data for OPEP6 is the average from 5 independent 30 ns trajectories. Estimated standard deviations were obtained from the variation the average of the pKa values from this set of independent runs and are given between parenthesis. Residue HEWL Glu7 His15 Asp18 Glu35 Asp48 Asp52 Asp66 Asp87 Asp101 Asp119

Experimenta

2.6(2) 5.5(2) 2.8(3) 6.1(4) 1.4(2) 3.6(3) 1.2(2) 2.2(1) 4.5(1) 3.5(3) MAX AAD RMSD

GBa 0-1b

All-atom REX-CpHMDa,c 0-5 5-10 0-10

2.6(1) 5.3(5) 2.9(0) 4.4(2) 2.8(2) 4.6(0) 1.2(4) 2.0(1) 3.3(3) 2.5(1) 1.7 0.5 0.9

3.6 5.1 2.5 8.5 0.1 5.4 0.6 0.8 6.1 3.0 2.4 1.3 1.4

3.4 5.1 3.3 8.7 1.1 5.6 0.8 2.1 5.7 3.3 2.6 0.9 1.2

3.5(1) 5.1(0) 2.9(4) 8.6(1) 0.6(6) 5.5(1) 0.3(7) 1.5(7) 5.9(2) 3.2(1) 2.5 1.0 1.2

SBMd

propKae

FPTSrigid e

3.2 5.1 2.3 3.2 3.5 3.0 3.1 3.2 3.0 3.1 2.9 1.2 1.4

3.98 6.71 3.41 6.51 1.81 3.83 1.85 3.27 3.92 3.58 1.4 0.7 0.8

3.3 5.6 2.8 3.5 3.4 3.3 3.0 3.2 2.9 3.2 2.6 1.0 1.4

OPEP6f 0-5 0-30 3.30(20) 6.01(13) 2.50(54) 3.59(22) 3.35(20) 3.52(18) 3.40(13) 3.10(22) 3.21(10) 3.30(21) 2.5 1.1 1.4

3.36(10) 5.99(9) 3.01(10) 3.60(31) 3.31(10) 3.33(24) 3.07(6) 3.22(17) 3.03(4) 3.26(14) 2.5 1.1 1.3

the predicted pKa for HIS166 was even more apart from the experimental value (6.46(5) vs. 5.39(2)). This indicates that the separation distance between titratable groups in the neighbor of HIS166 were not the same observed during the MD trajectory in respect to the experimental input protein conformation. Other theoretical methods also have difficulties to well reproduce the experimental behavior of this amino acid. For instance, simulations with all-atom REX-CpHMD gives 4.4(0) which has also a similar difference of about 1 unit of pH with respect to the experimental pKa value. As seen for the FPTSrigid , the deviations obtained by the OPEP6 are a little smaller than the others for this protein system.

pKa calculation for HEWL Following the same sequence of the most interesting proteins used before in our previous pKa benchmark of the F P T Srigid method, the third studied system here was HEWL. Due to its high number of ionizable chemical groups, this is one of proteins more tested in pKa benchmarks as it can be seen in several papers (e.g. 3,12,35,39,76–78 ). 24

ACS Paragon Plus Environment

Page 25 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Experimental pKa s and the corresponding values calculated by GB, all-atom REXCpHMD, SBM, PropKa, and FPTSrigid and OPEP6 are compared in Table 3.The smallest deviations are observed for the theoretical predictions given by PropKa (RMDE= 0.8). This is not a surprise due to the smart parametrization strategy adopted by PropKa to fit the calculated pKa to reproduce well experimental pKa s. In fact, the parametrization focused on Asp and Glu, 7 which should increase the success when applied in a protein rich on these amino acids as is HEWL. The results obtained by the other methods including OPEP6 are similar except the MAX observed for the SBM that is relatively higher than all the others (MAX= 2.9). There is a modest improvement in OPEP6 (RMSE= 1.3) when compared to FPTSrigid (RMSE= 1.4). As already discussed, this is probably due to the simplification in the bead description adopted by our titration scheme. It is interesting to observe that regardless of its approximations, OPEP6 has an equivalent accuracy to the more computationally demanding approach followed by the all-atom REX-CpHMD. For instance, both methods predicted pKa values with a RMSE equals to 1.4 in the short simulation runs of 5 ns. After convergence been reached by OPEP6, the difference between them is also marginal (≈ 0.1 pH units) for all the three main descriptors. OPEP6 is more accurate than the SBM even if this other CG method used specific parameters to reproduce the crystallographic structure of this protein. The residue GLU-35 remains a challenge for the theoretical methods as already discussed. 12 Despite the larger fluctuations (as seen in the experimental data), the pKa for GLU-35 is 0.2 pH units closer to the experimental value (6.1) in the predicted pKa by OPEP6 (3.7) in comparison to FPTSrigid (3.5). There seems to be a trend that theoretical methods using implicit solvent (GB, FPTSrigid , OPEP6) tend to underestimate the protonation of this amino acid while the predictions by explicit solvent methods as the all-atom REX-CpHMD overestimated it. This was also observed by the predicted values by the SBM (pKa( GLU −35) = 3.2) 39 and Pcetk/pDynamo (pKa( GLU −35) = 5.4) 78 Deeper analysis of the similarities and differences that affect the proper reproduction of experimental pKa s by all

25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 43

Table 4: Calculated and experimental pKa values of the NTL9. Salt concentration is 100mM. a Data taken from ref. 79 b Data taken from ref. 66 c Data taken from ref. 38 d Tyrosines were not free to titrate as a function of solution pH. f The data for OPEP6 is the average from 10 independent 10ns trajectories. Residue ASP8 GLU17 ASP23 GLU38 GLU48 GLU54

Experimenta 3.0 3.6 3.1 4.0 4.2 4.2 MAX AAD RMSE

GBb 3.19(20) 3.67(13) 2.11(11) 3.70(19) 3.74(20) 3.64(8) 1.0 0.4 0.5

All-atom REX-CpHMDb 2.83(7) 3.57(14) 2.75(16) 3.38(30) 3.47(17) 3.65(22) 1.2 0.4 0.6

SBMc,d 1.9 3.0 3.2 3.6 3.5 3.4 1.1 0.6 0.7

propKa 3.84 4.47 2.83 4.71 4.62 4.57 0.9 0.6 0.6

FPTSrigid d,e 2.61 2.91 3.58 4.07 3.53 3.83 0.7 0.4 0.5

OPEP6d,f 3.04(51) 2.69(99) 3.63(14) 3.25(83) 3.55(15) 3.63(16) 0.9 0.6 0.6

these theoretical methods might contribute to their improvement.

pKa calculation for NTL9 The N-terminal domain from the ribosomal protein L9 has been investigated before by different theoretical methods including GB, all-atom REX-CpHMD and the SBM. These results together with outcomes obtained by OPEP6, FPTSrigid and propKa simulations are listed in table 4. The smallest RMSEs (0.5) are found for GB and the FPTSrigid . For GB, the difference is more concentrated in a single amino acid (ASP8) while for the latter more amino acids are affected with a smaller deviation from the experimental value. Most calculations were carried out with a neutral TYR as done in other works. 38 To access the effect of this assumptions, we performed FPTSrigid simulations with both situations. If TYR is allowed to titrate, the RMSE decreases to 0.4 for FPTSrigid while the differences in the pKa s are marginal. The larger deviation between a simulation with a titratable TYR and a constant neutral TYR case is 0.08 seen for ASP8 (pKa = 2.69 for a protonated TYR). The averaged and RMSEs are 0.04 and 0.05, respectively. These numbers are one order of magnitude smaller than common AADs and RMSEs found between predicted and experimental pKa s. OPEP6 gives results equivalent to propKa and all-atom REX-CpHMD. The less accurate 26

ACS Paragon Plus Environment

Page 27 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

data (RMSE equals to 0.7) was given by the SBM even using a specific set of parameters to keep NTL9 closer to its native structure and very long simulations (order of µs). As observed for other protein systems, standard deviations on the pKa s measurements are relatively large for OPEP6. Similar trends can also be seen by the other methods based on the MD sampling. Comparing OPEP6 with its rigid version based on a single static protein configuration shows no improvement of OPEP6’s pKa s for this protein system. This might be related to the fact that the force field parametrization of OPEP was constructed on protein structures often obtained at a fixed solution pH. Typically, pH 7 was used in the structural experiments.

pKa calculation for a protein G variant Another protein included in this benchmark study was a protein G variant with mutations T2Q, N8D and N37D. The main reason to discuss this protein system here is the possibility to compare the present outcomes with another method, the TKSA, 39 also rooted in the TK model 60,61 as is the FPTS. Table 5 shows the outcomes from experimental and the computed pKa values by SMB, TKSA, PropKa, FPTSrigid and OPEP6. Despite some common theoretical aspects between the TKSA and the FPTS (either as FPTSrigid or as in OPEP6), TKSA has difficulties to well describe the protonation states of this protein. The RMSE is quite high (1.8) compared with the other methods. Even the ”NULL model“, 80,81 where site-site interactions are altogether neglected, would give a much more accurate result than the TKSA (RMSEN U LL = 1.0). The large deviation of 4.2 pH units observed for GLU27 (the predicted pKa was 0.6 while the experimental value is 4.8) and the other high values for GLU15, ASP37 and GLU56 contribute with this weak performance. Except for TKSA, the other theoretical methods predicts pKa s with a similar accuracy (RMSE= 1.0). We tested the effect to allow TYR to titrate on FPTSrigid for this protein. The outcomes (not listed in table 5) for this specific problem were the same as observed for the used model with the commonly used assumption that TYR should remain constantly neutral. OPEP6 has an identical (good) performance as the FPTSrigid . The results are also equivalent with the SBM 27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 43

Table 5: Calculated and experimental pKa values of the Protein G Variant with mutations T2Q, N8D and N37D. Salt concentration is 10mM. Tyrosines were not free to titrate as a function of solution pH. A neutral state was assumed for them in all calculations following ref. 38 a Data taken from ref. 82 b Data taken from ref. 39 c Decoupled MD simulation with OPEP force-field. d The data for OPEP6 is the average from 5 independent 10 ns trajectories. Residue Asp8 Glu15 Glu19 Asp22 Glu27 Asp36 Asp37 Asp40 Glu42 Asp46 Asp47 Glu56

Experimenta 4.9 4.6 3.9 3.0 4.8 4.2 6.5 4.1 4.8 3.8 3.1 3.8 MAX AAD RMSE

SMB 3.5 4.0 4.0 3.7 3.5 4.1 4.1 4.0 4.6 4.0 3.6 4.6 2.4 0.7 1.0

b

TKSAb 4.2 1.8 4.0 3.4 0.6 4.1 4.0 4.3 5.0 4.0 2.9 6.5 4.2 1.2 1.8

propKa 3.89 3.61 2.97 2.48 3.52 3.39 4.17 3.98 4.59 3.45 2.15 2.96 2.3 0.9 1.0

FPTSrigid 3.1 4.0 4.4 3.7 3.4 4.0 4.0 3.9 4.9 3.8 3.5 4.1 2.5 0.7 1.0

FPTSrigid (MD)c 2.7 4.1 4.4 3.6 2.7 4.0 3.9 3.4 5.0 3.8 3.9 4.8 2.6 0.9 1.3

OPEP6d 3.44(3) 4.18(20) 4.32(10) 3.69(15) 3.74(21) 3.77(18) 3.96(16) 3.77(12) 4.91(8) 3.93(7) 3.78(63) 4.50(43) 2.5 0.7 1.0

using much shorter simulations. The residues ASP47 and GLU56 show larger fluctuations in their predicted pKa s by OPEP6 (≈ 0.5) probably due to the diversity of protein conformations produced during the trajectory and a high electrostatic coupling between these two titratable groups. Since this is a protein system with a large number of available experimental titratable groups, an additional MD simulation run for tests purposes with the decoupled protonation-protein conformational simulation was also carried out. It can be seen that the three descriptors have higher values in this case when compared to OPEP6: MAX, AAD and RMSE are, respectively, 2.6, 0.9 and 1.3 for FPTSrigid (MD) instead of 2.5, 0.7 and 1.0 achieved with OPEP6. This result reflects that the protein conformations produced by OPEP6 at different pHs are different from the ones obtained by OPEP5 and with a behavior now in OPEP6 closer to the experimental one.

28

ACS Paragon Plus Environment

Page 29 of 43

GLN-5A, SER-9A

GLU-4A, ALA-30B

15

35 pH 7 pH 10

pH 7 pH 10

30

dMD (Å)

25 dMD (Å)

10

20

15

10 5 5 0

5

10

time (ns)

15

5

0

time (ns)

10

15

GLU-4A, GLY-1A 15

dMD (Å)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

pH 7 pH 10

10

5

0

5

time (ns)

10

15

Figure 4: Structural features of INS at different solution pHs. Separation distances (dM D ) related to the experimental relative weights of multiple protein conformers were measured during 15 ns of MD trajectories at pH 7 and 10. Data at 100mM of salt for key regions: a) Top Left: GLN-5A and SER-9A cavity (distance given for the OPEP side chains GLN and SER beads). b) Top Right: GLU-4A and ALA-30B (distance given for the OPEP side chain GLU and ALA/CB beads). c) Bottom: GLU-4A and GLY-1A (distance given for the OPEP side chain GLU and the GLY/NT beads).

Structural properties of insulin The molecular conformational changes as a function of pH of the peptide hormone INS have been well experimentally described by Gursky and collaborators. 67 From solution pH 7 to 10, they explored the INS conformations as specific amino acids titrate. Therefore, this is an interesting system to be investigated by a CpH method in order to verify if the protein

29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

structures produced during the simulation run are experimentally meaningful also at the structural level. Unfortunately, most of the protein atoms do not differ much upon the pH change, 67 and we have to focus on a few local pH-dependent structural changes. Residues with this pH dependency involved in two specific regions are particularly appealing because they appear with not the same probabilities of conformers at different pH solutions which allow us to calculate related structural properties from the MD trajectories. These residues are those interacting with GLN-5A and GLU4-A. They are also related to the ”open“ (when the cleft across the interface between the INS dimer is filled by ordered water molecules) and ”closed“ (when GLN-5A is oriented toward the dimer interface and expel water molecules). 67 Using the occupancies of the alternative amino acids conformers, Gursky and co-authors mapped experimentally the relative weights of these multiple protein conformers defining the probabilities for their ”open“ and ”closed“ states. Since these states also correlated with the separation distances between the amino acids as discussed in their work, we measured the corresponding distances in the MD trajectories obtained with OPEP6. The main results are given in Fig. 4. At pH 7, 70% of the INS conformers are experimentally at the ”open“ state while 30% were found at the ”closed“ state. 67 Conversely, at pH 10, all the conformers are at the ”closed“ state. 67 This implies that the separation distance between the amino acids involved in the GLN-5A cavity should be larger at pH 7 than at pH 5. In fact, as seen in Fig. 4a, we observed during the MD runs that the separation distance (dM D ) between GLN-5A and SER-9A are larger at pH 7 (dM D = 10.0 ± 1.7˚ A) in agreement with the experimental behavior. At pH 10, the averaged separation distance is smaller (dM D = 7.0 ± 1.0˚ A). For the pair of residues GLU-4A and ALA-30B (see Fig. 4b), we measured separation distances equal to 16.5 ± 4.8˚ A and 11.9 ± 4.6˚ A, respectively, at pH 7 and 10. The trend is similar to what was seen for the pair GLN-5A–SER-9A although both the observed variations in Fig. 4b and the estimated errors turn difficult to make a strong statement. Another pair discussed in the experimental work is GLU-4A and GLY-1A. It was argued that the separation distance between GLU-4A and GLY-1A is increased at solution pHs ≥ 9. 67 This is well reproduced

30

ACS Paragon Plus Environment

Page 30 of 43

Page 31 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

in the simulations as it can be observed in Fig. 4c. We found dM D = 7.8 ± 1.6˚ A for the sampling at pH 7 and indeed a larger value at pH 10 (dM D = 11.9 ± 1.1˚ A).

Global analysis for this set of proteins The overall accuracy of OPEP6 for this specific set of protein systems in terms of the pKa s is quite similar with the ones obtained by its core titration engine with a marginal improvement. The AAD and RMSE calculated using all the data reported on the previous tables are the same for both OPEP6 and FPTSrigid (AAD= 0.7 and RMSE= 0.9). The MAX value decreased from 2.6 for FPTSrigid to 2.5 for OPEP6. These results are quite closer to the accurate achieved by PropKa. In the latter, the MAX, AAD and RMSE are, respectively, 2.3, 0.6 and 0.8 for the same set of proteins. We note that in a larger set of proteins, the results for propKa were 5.2, 1.0 and 1.4, 83 while FPTSrigid achieved 5.2, 0.8 and 1.5 (with the inclusion of proteins with titratable groups very deeply located in the proteins) or 4.9, 1.0 and 1.4 (with the absence of the deeply located groups). 12 Moreover, the correlations between experimental and predicted pKa s by OPEP6 are essentially the same as the ones obtained by FPTSrigid . 12 Depending on the set of proteins included in the study, they can be even slightly more correlated than the outcomes provided by propKa. 12 Therefore, the results obtained by OPEP6 are in within the typical range of values given by different theoretical models. 35 Together with the improvement in the description of the structural features as a function of pH, all these data confirms the robustness and ability of OPEP6 to properly describe the system’s physics at lower cpu costs.

Performance Lastly, we briefly discuss the computational performance of the simulations using OPEP6. Each call to the titration scheme during the MD sampling is done at a minimal computational cost. A considerably gain achieved by the FPTS is the drastically reduction in computational time which permits the repetition of the calculation for many protein conformations passed 31

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

through the MD engine. The averaged running time if the FPTS after a call from the MD engine based on 5 independent measurements (given in s) is 0.063, 0.052, 0.182, 0.036, 0.039 and 0.068, for, respectively, protein G variant, BBL, HEWL, HP36, INS and NTL9. This set of times was estimated in a personal notebook (Intel i7-5500U and 2.40 GHz – running ubuntu 16.04LTS). Consequentially, they have a marginal impact on the MD engine. For instance, the cpu time increases only by 1.009 when OPEP5 is replaced by OPEP6 for a protein like the protein G variant at pH 7 in a single 5ns MD run.

Conclusion We present here as a proof-of-concept an initial version of a new computational tool that combines the main features of previously developed coarse-grained protein force field, OPEP5, with a fast and accurate titration scheme, FPTS. Although this new tool named OPEP6 offered a modest improvement in the predicted pKa values, it is now possible to obtain a detailed knowledge of the pH-dependent dynamical processes. The method is general and not specific for a given protein as other available CpH CG models. This is an important step forward to precisely quantify pH effects on the protein structure, stability and interactions (e.g. protein folding, enzyme Catalysis, amyloids aggregation, drug interactions, etc.). The accuracy of OPEP6 was quantified by means of critical tests comparing the predicted outcomes from OPEP6 with experimental data and other theoretical tools prediction. Different proteins with a distinct number of ionizable groups, structural features and biological functions were included in this benchmark. Deviations are found within the typical range of values given by other theoretical models. The average, maximum absolute and root-mean-square deviations were measured as [0.3 − 1.1], [0.6 − 2.5] and [0.4 − 1.3] pH units, respectively, for the five studied proteins. The good agreement between the predicted values by OPEP6 is per se an important achievement. Moreover, the reproduction of the local pH-dependent structural properties of insulin indicates that OPEP6 is able to also describe

32

ACS Paragon Plus Environment

Page 32 of 43

Page 33 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

protonation-coupled conformational dynamics. Together, these results show that OPEP6 captures the basic physics of the real systems with virtually no impact in the cpu costs of an ordinary OPEP MD run. Further developments are necessary and include the refinement and addition of all possible charge-charge interaction between titratable groups in the ad hoc ion-ion potential terms. A compromise between a less coarse-grained version of the FPTSrigid and the necessary frequency of call might provide a better manner to introduce larger ionizable effects on the protein conformations. Nevertheless, despite these possible improvements, the present achieved success of the new OPEP6 already opens up more possibilities to study pH-mediated biological processes. It can play an important role to address multibiomolecular systems with a large number of titratable groups highly electrotrastically coupled.

Acknowledgement This work has been supported in part by the Funda¸ca˜o de Amparo a Pesquisa do Estado de S˜ao Paulo [Fapesp 2015/16116-3 (FLBDS)], the Conselho Nacional de Desenvolvimento Cientifico e Tecnol´ogico (CNPq), the Universit´e Sorbonne Paris Cit´e (USPC) and the French grant DYNAMO - ANR-11-LABX-0011-01. FLBDS thanks also the support of the University of S˜ao Paulo through the NAP-CatSinQ (Research Core in Catalysis and Chemical Synthesis), the computing hours at The Swedish National Infrastructure for Computing (SNIC 2018/1-25) and the hospitality of the Institut de Biologie Physico Chimique/USPC. PhD also thanks support from University Paris 7 and CNRS.

References (1) Chen, W.; Morrow, B. H.; Shi, C.; Shen, J. K. Recent development and application of constant pH molecular dynamics. Mol. Sim. 2014, 40, 830–838. 33

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(2) Kim, M. O.; McCammon, J. A. Computation of pH-Dependent Binding Free Energies. Biopolymers 2016, 105, 43–49. (3) Barroso da Silva, F. L.; Dias, L. G. Development of constant-pH simulation methods in implicit solvent and applications in biomolecular systems. Biophys Rev 2017, 699–728. (4) Bashford, D. In Scientific Computing in Object-Oriented Parallel Environments: First International Conference, ISCOPE 97 Marina del Rey, California, USA December 8– 11, 1997 Proceedings; Ishikawa, Y., Oldehoeft, R. R., Reynders, J. V. W., Tholburn, M., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 1997; pp 233–240. (5) Baker, N. A.; Sept, D.; Joseph, S.; Holst, M. J.; McCammon, J. A. Electrostatics of nanosystems: Application to microtubules and the ribosome. Proc. Natl. Acad. Sci. USA 2001, 98, 10037–10041. (6) Li, H.; Robertson, A. D.; Jensen, J. H. Very fast empirical prediction and rationalization of protein pKa values. Proteins: Structure, Function, and Bioinformatics 2005, 61, 704–721. (7) Olsson, M. H.; Sondergard, C. R.; Rostkowski, M.; Jensen, J. H. PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa predictions. J. Chem. Theory and Computation 2011, 7, 525–537. (8) Wang, L.; Li, L.; Alexov, E. pKa predictions for proteins, RNAs, and DNAs with the Gaussian dielectric function using DelPhi pKa. Proteins 2015, 83, 2186–2197. (9) Kesvatera, T.; J¨onsson, B.; Thulin, E.; Linse, S. Measurement and Modelling of Sequence-specific pKa Values of Calbindin D9k . J. Mol. Biol. 1996, 259, 828. (10) Teixeira, A. A.; Lund, M.; Barroso da Silva, F. L. Fast Proton Titration Scheme for Multiscale Modeling of Protein Solutions. Journal of Chemical Theory and Computation 2010, 6, 3259–3266. 34

ACS Paragon Plus Environment

Page 34 of 43

Page 35 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(11) Carnal, F.; Claviera, A.; Stoll, S. Modelling the interaction processes between nanoparticles and biomacromolecules of variable hydrophobicity: Monte Carlo simulations. Environ. Sci.: Nano 2015, 327–339. (12) Barroso da Silva, F. L.; MacKernan, D. Benchmarking a Fast Proton Titration Scheme in Implicit Solvent for Biomolecular Simulations. J. Chem. Theory Comput. 2017, 13, 2915–2929. (13) Baptista, A. M.; Marte, P. J.; Petersen, S. B. Simulation of Protein Conformational Freedom as a Function of pH: Constant-pH Molecular Dynamics Using Implicit Titration. Proteins: Struc., Func., and Genetics 1997, 27, 523–544. (14) Santos, H. A. F.; cosa, D. V.-V.; Teixeira, V. H.; Baptista, A. M.; Machuqueiro, M. Constant-pH MD Simulations of DMPA/DMPC Lipid Bilayers. J. Chem. Theory Comput. 2015, 11, 5973–5979. (15) Fuzo, C. A.; Degr`eve, L. The pH dependence of flavivirus envelope protein structure: insights from molecular dynamics simulations. J. Biomol. Struct. Dyn. 2014, 32, 1563– 1574. (16) Lee, M. S.; Jr., F. R. S.; Brooks, C. L. Constant-pH molecular dynamics using continuous titration coordinates. Proteins 2004, 56, 738–752. (17) Knight, J. L.; III, C. L. B. Multisite λ Dynamics for Simulated Structure-Activity Relationship Studies. J. Chem. Theory Comput. 2011, 7, 2728–2739. (18) Chen, W.; Huang, Y.; Shen, J. K. Conformational Activation of a Transmembrane Proton Channel from Constant pH Molecular Dynamics. J. Phys. Chem. Lett. 2016, 7, 3961–3966. (19) Socher, E.; Stich, H. Mimicking titration experiments with MD simulations: A proto-

35

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

col for the investigation of pH-dependent effects on proteins. Scientific Reports 2016, 22523, 1–12. (20) Donnini, S.; Ullmann, R. T.; Groenhof, G.; Grubm¨ller, H. Charge-Neutral Constant pH Molecular Dynamics Simulations Using a Parsimonious Proton Buffer. J. Chem. Theory Comput. 2016, 12, 1040–1051. (21) Tummanapelli, A.; Vasudevan, S. Ab Initio Molecular Dynamics Simulations of Amino Acids in Aqueous Solutions: Estimating pKa Values from Metadynamics Sampling. J. Phys. Chem. B 2015, 119, 12249–12255. (22) Kamerlin, S. L.; Haranczyk, M.; Warshel, A. Progresses in Ab Initio QM/MM Free Energy Simulations of Electrostatic Energies in Proteins: Accelerated QM/MM Studies of pKa, Redox Reactions and Solvation Free Energies. J. Phys. Chem. B 2009, 113, 1253–1272. (23) Jensen, J. H.; Li, H.; Robertson, A. D.; Molina, P. A. Prediction and Rationalization of Protein pKa Values Using QM and QM/MM Methods. J. Phys. Chem. A 2005, 109, 6634–6643. (24) Lund, M.; J¨onsson, B. A mesoscopic model for protein-protein interactions in solution. Biophys. J. 2003, 85, 2940–2947. (25) Lund, M.; J¨onsson, B. On the Charge Regulation of Proteins. Biochemistry 2005, 44, 5722–5727. (26) Barroso da Silva, F. L.; Lund, M.; J¨onsson, B.; ˚ Akesson, T. On the Complexation of Proteins and Polyelectrolytes. J. Phys. Chem. B 2006, 110, 4459–4464. (27) J¨onsson, B.; Lund, M.; Barroso da Silva, F. L. Electrostatics in Macromolecular Solution. Food Colloids: Self-Assembly and Material Science. Londres, 2007; pp 129–154.

36

ACS Paragon Plus Environment

Page 36 of 43

Page 37 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(28) Persson, B.; Lund, M.; Forsman, J.; Chatterton, D. E. W.; ˚ Akesson, T. Molecular evidence of stereo-specific lactoferrin dimers in solution. Biophys Chem. 2010, 3, 187– 189. (29) Kurut, A.; Persson, B. A.; ˚ Akesson, T.; Forsman, J.; Lund, M. Anisotropic Interactions in Protein Mixtures: Self Assembly and Phase Behavior in Aqueous Solution. J. Phys. Chem. Lett. 2012, 3, 731–734. (30) Kurut, A.; Dicko, C.; Lund, M. Dimerization of Terminal Domains in Spiders Silk Proteins Is Controlled by Electrostatic Anisotropy and Modulated by Hydrophobic Patches. ACS Biomater. Sci. Eng. 2015, 1, 363–371. (31) Delboni, L.; Barroso da Silva, F. L. On the complexation of whey proteins. Food Hydrocolloids 2016, 55, 89–99. (32) Barroso da Silva, F. L.; Pasquali, S.; Derreumaux, P.; Dias, L. G. Electrostatics analysis of the mutational and pH effects of the N-terminal domain self-association of the Major Ampullate Spidroin. Soft Matter 2016, 12, 5600–5612. (33) Kong, X.; III, C. L. B. λ-dynamics: a new approach to free energy calculations. J. Chem. Phys. 1996, 105, 128–141. (34) Chen, Y.; Roux, B. Constant-pH Hybrid Nonequilibrium Molecular DynamicsMonte Carlo Simulation Method. J. Chem. Theory Comput. 2015, 11, 3919–3931. (35) Chen, W.; Wallace, J. A.; Yue, Z.; Shen, J. K. Introducing Titratable Water to AllAtom Molecular Dynamics at Constant pH. Biophys. J. 2013, 105, L15–L17. (36) Williams, S. L.; de Oliveira, C. A. F.; McCammon, J. A. Coupling Constant pH Molecular Dynamics with Accelerated Molecular Dynamics. J. Chem. Theory Comput. 2010, 6, 560–568.

37

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(37) Radak, B. K.; Roux, B. Efficiency in nonequilibrium molecular dynamics Monte Carlo simulations. J. Chem. Phys. 2016, 145, 124109. (38) Contessoto, V. G.; de Oliveira, V. M.; de Carvalho, S. J.; Oliveira, L. C.; Leite, V. B. P. NTL9 Folding at Constant pH: The Importance of Electrostatic Interaction and pH Dependence. J. Chem. Theory Comput. 2016, 12, 3270–3277. (39) de Oliveira, V. M.; de Godoi Contessoto, V.; da Silva, F. B.; Caetano, D. L. Z.; de Carvalho, S. J.; Leite, V. B. P. Effects of pH and Salt Concentration on Stability of a Protein G Variant Using Coarse-Grained Models. Biophys. J. 2018, 114, 65–75. (40) Noguti, T.; Go, N. Efficient Monte Carlo Method for Simulation of Fluctuating Conformations of Native Proteins. Biopolymers 1985, 24, 527–546. (41) Wei, G. H.; Derreumaux, P.; Mousseau, N. J. Chem. Phys. 2003, 119, 6403–6406. (42) Maupetit, J.; Tuffery, P.; Derreumaux, P. A coarse-grained protein force field for folding and structure prediction. Proteins 2007, 69, 394–408. (43) Song, W.; Wei, G.; Mousseau, N.; Derreumaux, P. Self-Assembly of the β2Microglobulin NHVTLSQ Peptide Using a Coarse-Grained Protein Model Reveals a β2-Barrel Species. J. Phys. Chem. B 2008, 112, 4410–4418. (44) Chebarro, Y.; Pasquali, S.; Derreumaux, P. The Coarse-Grained OPEP force field for non-amyloid and amyloid proteins. J. Phys. Chem. B 2012, 116, 8741–8752. (45) Sterpone, F.; Melchionna, S.; Tuffery, P.; Pasquali, S.; Mousseau, N.; Cragnolini, T.; Chebaro, Y.; Saint-Pierre, J.-F.; Kalimeri, M.; Barducci, A.; Laurin, Y.; Tek, A.; Baaden, M.; Nguyen, P. H.; Derreumaux, P. THE OPEP COARSE-GRAINED PROTEIN MODEL: FROM SINGLE MOLECULES, AMYLOID FORMATION, ROLE OF MACROMOLECULAR CROWDING AND HYDRODYNAMICS TO RNA/DNA COMPLEXES. Chem Soc Rev. 2014, 43, 4871–4893. 38

ACS Paragon Plus Environment

Page 38 of 43

Page 39 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(46) Kalimeri, M.; Derreumaux, P.; Sterpone, F. J. Non Crys Sol 2015, 407, 494–501. (47) Sterpone, F.; Nguyen, P. H.; Kalimeri, M.; Derreumaux, P. Importance of the ion-pair interactions in the OPEP coarse-grained force field: parametrization and validation. J. Chem. Theory Comput. 2013, 9(10), 4574–4584. (48) Barroso da Silva, F. L.; Derreumaux, P.; Pasquali, S. Fast coarse-grained model for RNA titration. J. Chem. Phys. 2017, 146, 035101+. (49) Davies, M. N.; Toseland, C. P.; Moss, D. S.; Flower, D. R. Benchmarking pKa prediction. BMC Biochemistry 2006, 7, 1–12. (50) Baptista, A. M.; Teixeira, V. H.; Soares, C. M. Constant-pH molecular dynamics using stochastic titration. J. Chem. Phys. 2002, 117, 293–309. (51) Kirkwood, J. G.; Shumaker, J. B. Forces Between Protein Molecules in Solution Arising from Fluctuations in Proton Charge and Configuration. Proc. Natl. Acad. Sci. USA 1952, 38, 863–871. (52) Barroso Da Silva, F. L.; J¨onsson, B. Polyelectrolyte-protein complexation driven by charge regulation. Soft Matter 2009, 5, 2862–2868. (53) Lund, M.; J¨onsson, B. Charge regulation in biomolecular solution. Quarterly Reviews of Biophysics 2013, 46, 265–281. (54) Barroso da Silva, F. L.; Bostr¨om, M.; Persson, C. Effect of Charge Regulation and IonDipole Interactions on the Selectivity of ProteinNanoparticle Binding. Langmuir 2014, 30, 4078–4083. (55) Nasica-Labouze, J. et al. Amyloid β Protein and Alzheimers Disease: When Computer Simulations Complement Experimental Studies. Chem. Rev. 2015, 115, 3518–3563. (56) Selkoe, D. J.; Hardy, J. The amyloid hypothesis of Alzheimer’s disease at 25years. EMBO Molecular Medicine 2016, 8, 595–608. 39

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(57) Carballo-Pacheco, M.; Ismail, A. E.; Strodel, B. On the Applicability of Force Fields To Study the Aggregation of Amyloidogenic Peptides Using Molecular Dynamics Simulations. J. Chem. Theory Comput. 2018, 14, 6063–6075. (58) Kouza, M.; Banerji, A.; Kolinski, A.; Buhimschi, I. A.; Kloczkowski, A. Oligomerization of FVFLM peptides and their ability to inhibit beta amyloid peptides aggregation: consideration as a possible model. Physical Chemistry Chemical Physics 2017, 19, 2990–2999. (59) Proctor, E. A.; Fee, L.; Tao, Y.; Redler, R. L.; Fay, J. M.; Zhang, Y.; Lv, Z.; Mercer, I. P.; Deshmukh, M.; Lyubchenko, Y. L.; Dokholyan, N. V. Nonnative SOD1 trimer is toxic to motor neurons in a model of amyotrophic lateral sclerosis. Proc. Natl. Acad. Sci. USA 2016, 113, 614–619. (60) Tanford, C.; Kirkwood, J. G. Theory of Protein Titration Curves I. General Equations for Impenetrable spheres. J. Am. Chem. Soc. 1957, 79, 5333–5339. (61) Barroso da Silva, F. L.; J¨onsson, B.; Penfold, R. A critical investigation of the TanfordKirkwood scheme by means of Monte Carlo simulations. Prot. Sci. 2001, 10, 1415–1425. (62) Beresford-Smith, B.; Chan, D. Y. C. Electrical double-layer interactions in concentrated colloidal systems. Faraday Disc. Chem. Soc. 1983, 76, 65–75. (63) de Carvalho, S. J.; Ghiotto, R. C. T.; Barroso da Silva, F. L. Monte Carlo and Modified Tanford-Kirkwood Results for Macromolecular Electrostatics Calculations. J. Phys. Chem. B 2006, 110, 8832–8839. (64) Nozaki, Y.; Tanford, C. Examination of Titratation Behavior. Methods Enzymol. 1967, 11, 715–734. (65) Masunov, A.; Lazaridis, T. Potentials of Mean Force between Ionizable Amino Acid Side Chains in Water. J. Am. Chem. Soc. 2203, 125, 1722–1730. 40

ACS Paragon Plus Environment

Page 40 of 43

Page 41 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(66) Wallace, J. A.; Shen, J. K. Continuous constant pH molecular dynamics in explicit solvent with pH-based replica exchange. J. Chem. Theory Comput. 2011, 7, 2617– 2629. (67) Gursky, O.; Badger, J.; Li, Y.; Caspar, D. L. Conformational changes in cubic insulin crystals in the pH range 711. Biophys. J. 1992, 63, 1210–1220. (68) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Research 2000, 28, 235–242. (69) OPEP Files Generator. http://opep.galaxy.ibpc.fr. (70) Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Greenblatt, G. S.; Meng, E. C.; Ferrin, T. E. UCSF Chimera: a Visualization System for Exploratory Research and Analysis. J. Comp. Chem. 2004, 25, 1605–1612. (71) Humphrey, W.; Dalke, A.; Schulten, K. VMD – Visual Molecular Dynamics. Journal of Molecular Graphics 1996, 14, 33–38. (72) Metropolis, N. A.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087– 1097. (73) Frenkel, D.; Smit, B. Understanding Molecular Simulation: From Algorithms to Applications; Academic Press: San Diego, 1996. (74) Stenqvist, B.; Thuresson, A.; Kurut, A.; V´acha, R.; Lund, M. Faunus – A flexible framework for Monte Carlo simulation. Mol. Sim. 2013, 39, 1233–1239. (75) Barroso da Silva, F. L.; Derreumaux, P.; Pasquali, S. Protein-RNA complexation driven by the charge regulation mechanism. Biochem. Biophys. Res. Commun. 2018, 298, 264–273. 41

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(76) Bashford, D.; Karplus, M. pKa’s of Ionizable Groups in Proteins: Atomic Detail from a Continuum Electrostatic Model. Biochemistry 1990, 29, 10219–10225. (77) Srivastava, D.; Santiso, E.; Gubbins, K.; Barroso da Silva, F. L. Computationally Mapping pKa Shifts Due to the Presence of a Polyelectrolyte Chain around Whey Proteins. Langmuir 2017, 33, 11417–11428. (78) Feliks, M.; Field, M. J. Pcetk: A pDynamo-based Toolkit for Protonation State Calculations. J. Chem. Inf. Model. 2015, 55, 2288–2296. (79) Kuhlman, B.; Luisi, D. L.; Young, P.; Raleigh, D. P. pKa Values and the pH Dependent Stability of the N-Terminal Domain of L9 as Probes of Electrostatic Interactions in the Denatured State. Differentiation between Local and Nonlocal Interactions. Biochemistry 1999, 38, 4896–4903. (80) Schutz, C. N.; Warshel, A. What Are the Dielectric “constants” of Proteins and How To Validate Electrostatic Models? Proteins: Struc., Func., and Genetics 2001, 44, 400–417. (81) Carstensen, T.; Farrell, D.; Huang, Y.; Baker, N. A.; Nielsen, J. E. On the development of protein pka calculation algorithms. Proteins 2011, 79, 3287–3298. (82) Lindman, S.; Bauer, M. C.; Lund, M.; Diehl, C.; Mulder, F. A. A.; Akke, M.; Linse, S. pKa Values for the Unfolded State under Native Conditions Explain the pH-Dependent Stability of PGB1. Biophys. J. 2010, 99, 3365–3373. (83) Stanton, C. L.; Houk, K. N. Benchmarking pKa Prediction Methods for Residues in Proteins. J. Chem. Theory Comput. 2008, 4, 951–966.

42

ACS Paragon Plus Environment

Page 42 of 43

Page 43 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

43

ACS Paragon Plus Environment