Design of protein-protein binding sites suggests a rationale for

Dec 11, 2018 - Francesca Nerattini , Luca Tubiana , Chiara Cardelli , Valentino Bianco , Christoph Dellago , and Ivan Coluzza. J. Chem. Theory Comput...
0 downloads 0 Views 12MB Size
Subscriber access provided by Stockholm University Library

Biomolecular Systems

Design of protein-protein binding sites suggests a rationale for naturally occurring contact areas Francesca Nerattini, Luca Tubiana, Chiara Cardelli, Valentino Bianco, Christoph Dellago, and Ivan Coluzza J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00667 • Publication Date (Web): 11 Dec 2018 Downloaded from http://pubs.acs.org on December 13, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Design of protein-protein binding sites suggests a rationale for naturally occurring contact areas Francesca Nerattini,† Luca Tubiana,† Chiara Cardelli,† Valentino Bianco,† Christoph Dellago,† and Ivan Coluzza∗,‡ †Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria ‡CIC biomaGUNE, Paseo Miramon 182, 20014 San Sebastian, Spain. IKERBASQUE, Basque Foundation for Science, 48013 Bilbao, Spain. E-mail: [email protected]

Abstract Molecular recognition is a critical process for many biological functions and consists in non-covalent binding of different molecules, such as protein-protein, antigenantibody and many others. The host-guest molecules involved often show a shape complementarity, and one of the leading specification for molecular recognition is that the interaction should ideally be specific, i.e. the host should strongly bind exclusively to one selected guest. Our work focuses on the role played by the chemical heterogeneity and the steric compatibility on the specificity power of the binding site between two proteins. We tackle the problem computationally, reducing the complexity of the system by simulating a protein and a surface-like element, that shapes part of the protein and represents the binding site of an interaction partner. We investigate four systems, differing in terms of binding site size. A significant result is that, despite the fact that protein and surface chemical sequences are interdependent and simultaneously generated to stabilise the bound folded structure, the protein is stable in the

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

folded conformation even in the absence of the surface-like partner for all investigated systems. We observe that an increase of the surface area results in a significant increase of the binding affinity. Interestingly, our data suggest the presence of upper and lower limits for the maximum and minimum area size available for a binding site. Our data match the experimental observation of such limits (750 -1500 ˚ A2 1 ), and provide a rationale for them: the extent of the binding site area is limited by the value of the binding constant. For large contact areas, at physiological conditions, the binding is orders of magnitude stronger (Ka > 1040 l/mol) that what typically observed in natural biological processes. Conversely, the smallest surface tested is just the minimal size to allow for specific binding.

Introduction In all living systems, proteins are often required to interact with each others in order to fulfil their functions. Indeed, protein-protein interactions are essential for many biological processes, such as signal transduction, vesicle transport, cell metabolism, to mention a few. 2,3 One of the main requirements of protein-protein recognition is to have a specific interaction between the partners. Expressly, each protein should bind strongly to one, or a few, partners and weakly, if at all, to all other biomolecules. The ability to bind exclusively to one protein partner is called specificity, while selectivity refers to proteins that bind more than one partner, although some with higher affinity. 4 Therefore, as stated by IUPAC, 5 “specificity is the ultimate of selectivity”, and is the ideal characteristic of artificially designed proteins to be used in medical treatments. The requirement that the binding should be strong and with high discriminatory resolution for the target, imposes constraints on the design of the binding sites. Moreover, the mechanism for specific protein-protein recognition can count on the chemical patterning of the binding surface, as well as on the steric compatibility between the interacting partners in the binding site area.

2

ACS Paragon Plus Environment

Page 2 of 29

Page 3 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Given the great importance of protein-protein interactions, much work has been done to understand the key ingredients of molecular recognition. Simple models have been used 6–11

to show how patterned surfaces can be designed to bind selectively a small number of

partners. Numerous studies focus also on artificial molecular recognition, that is, instead, based on synthetic materials (e.g. polymer based) designed to mimic the natural recognition of proteins. 12–20 In the present work, we want to investigate what is the optimal size range of the binding site in protein-protein interactions in order to achieve specific recognition. Here, we take into account the combined effect of the chemical heterogeneity and the steric compatibility to achieve binding specificity. 6,21–34 We simplify the complexity of the problem by computationally studying a system composed of an explicit protein G (alpha+beta immunoglobulinbinding protein; PDB ID 1PGB) and a protein-like surface, that is an artificial representation of the binding site of an interaction partner. Protein G has 56 residues, among which 80% constitute the secondary structure, and has been successfully designed with different protein models. 35,36 We perform Monte Carlo design simulations, to identify sequences optimised for the protein-protein binding. We employ the caterpillar coarse-grained protein model, 36,37 and we explore, through folding simulations, the specificity power and binding precision of optimally shaped protein binding sites. Our results demonstrate that we can, indeed, achieve specificity of the binding, and the efficiency range for specific binding sites is close to the experimentally determined natural one (∼ 750 − 1500 ˚ A 1 ), offering a rationale for such range. The manuscript is structured as follows: firstly we introduce the interaction model used to describe the protein, the modelling of a simplified system for protein-protein binding, and the computational method for design and folding simulations. Then, we present the folding properties of the protein, both in the bulk and in presence of the partner, evaluating the binding affinity and testing the specificity through simulations performed in presence of a random partner. In the last part we draw the conclusions of our investigation.

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Methods Our work follows the steps pictorially represented in Fig. 1: I) As explicit protein, we choose the extensively investigated protein G. Firstly, we remove the side chains and adjust the crystallographic structure, so to make it compatible with the caterpillar protein model. II) In order to study the influence of the binding site size on the protein-protein binding, we design a simplified system consisting of one explicit protein and a protein-like surface. The latter represents an idealised binding site (BS) and is constructed so to trace the surface profile of the protein partner, thus resulting in a perfect steric match. We construct four systems, differing in terms of BS surface area. III) We design each of the four systems considering simultaneously the protein G and the BS. The procedure consists in searching for the ensemble of sequences that minimise the energy of both the protein and the BS while keeping the system conformation frozen in space. IV) After selecting for each system the best designed sequence (see the Design subsection for details about the criterion), we isolate the portion relative to the protein G structure and test its folding ability in a singleprotein folding simulation. V) We check the binding properties of the protein sequences designed in the presence of the BS frozen in the simulation box (bearing the sequence designed concurrently to the protein). VI) Moreover, we test the specificity of the binding, simulating proteins against BSs that have not been designed concurrently.

Computational interaction model We employ the Caterpillar model, a coarse grained protein model that has been shown to accurately predict the folding of designed sequences, as well as natural ones. 36,37 Within the model, a single amino acid is represented by backbone atoms only, i.e. C, O, Cα , N , H atoms (Fig. 2). The backbone degrees of freedom are the Ramachandran torsional angles φ and ψ.

4

ACS Paragon Plus Environment

Page 4 of 29

Page 5 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 1: Pictorial representation of the steps employed to generate artificial binding sites for protein G and test the binding affinity between the two partners when artificially designed so to optimise the binding. I) The protein G structure (PDB ID 1PGB) is adapted to the coarse grained representation of the caterpillar model by deleting side chain amino acids. II) Usage of structure information only, to construct a simplified partner, representing the binding site (BS) of the protein partner and shaped so to trace the surface profile of protein G. Four BSs, differing in terms of surface area, are created. III) De novo design of the protein G-BS sequence in the folded bound configuration. The plot represents the free-energy of the sequence space. We select one sequence that is characterised by low energy and, therefore, optimise the binding. IV) For each system, we isolate the protein G sequence and test its folding ability in a single-protein folding simulation. V) Check of the binding properties of the protein sequences designed in presence of the BS frozen in the simulation box (bearing the sequence designed concurrently to the protein). VI) Test of the specificity of the binding, simulating proteins against BSs that have not been designed concurrently.

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 29

The intramolecular energy for a protein of length N has the form:

Eintra−P =

N N X X

Ehb (θ1 , θ2 , rOi Hj ) + Ehb (θ1 , θ2 , rOj Hi ) +

i=1 j>(i+2)

+

N X

N N X X

α[Esc (rCαi Cαj )]+

i=1 j>(i+2)

αEHOH Esol (Ω − Ωi ) + Ebond (rCi Ni ),

i=1

(1) where α = 0.44 and EHOH = 0.015 are two parameters, added to balance the relative weight of the energy terms. Ehb (θ1 , θ2 , rOH ) is a 10−12 Lennard Jones potential commonly used to represent hydrogen bonds: 38 "   12 10 # σ σ −6 , Ehb (θ1 , θ2 , rOH ) = −H [cos(θ1 ) cos(θ2 )]ν 5 rOH rOH

(2)

being rOH the distance between the hydrogen atom of the amide group and the oxygen atom of the carboxyl group of the main chain; ν = 2; σ = 2 ˚ A and H = 13.64kB T . θ1 and θ2 \ and OHN \ respectively (Fig. 2a), and account for the hydrogen bonds are the angles COH directionality. Esc (rCαi Cαj ) mimics the side chain-side chain interaction via a square-well-like isotropic potential:  Esc (rCαi Cαj ) = Cαi Cαj 1 −

1 1 + exp2.5(rh −rCαi Cαj )

 ,

(3)

where rCαi Cαj is the distance between the Cα atoms and rh = 12 ˚ A is the distance at which Esc (rCαi Cαj ) =

Cαi Cαj 2

.

Esol (Ω−Ωi ) is an implicit solvent energy term that acts as energy penalty if a hydrophobic

6

ACS Paragon Plus Environment

Page 7 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(hydrophilic) amino acid is exposed (buried), and has the form:

Esol (Ω − Ωi ) =

   isol [Ω − Ωi ] Ωi ≤ Ω   0

, Ωi =

N  X j=1

Ωi > Ω

1−



1 1 + exp2.5(rh −rCαi Cαj )

,

(4)

where Ω = 21 37 is a threshold for the number of contacts in the native structure above which the amino acid is considered to be fully buried, and isol is the Dolittle hydrophobicity index. 39 Ebond (rCN ) = k(rCN − rCNref )2 is a harmonic bonding term with elastic constant \ k = 20 kB T ˚ A−2 , that keeps fixed the distance rCN , along with the CC α N backbone angle.

Binding site modelling We model a protein-like surface that represent an idealised binding site between two natural proteins. The surface is constructed by pushing the protein on a flat mesh of self-avoiding beads (self avoiding radius rSA = 2 ˚ A), in order to obtain a mould. The mesh is made of beads arranged on a 2D square lattice and is initially placed on the z = 0 plane. We adopt a high density of the mesh (step 0.5 ˚ A) thus to guarantee that the protein does not cross the mesh at any point in the folding simulation. We firstly determine the size of the protein, by measuring the maximum CM-Cα distance in protein G, i.e. the maximum protein radius, rM AX ∼ 16 ˚ A. Then we use it to normalise zCM , i.e. the CM height with respect to the z = 0 plane, so that ζ =

zCM . rM AX

Hence, we push

the protein into the z = 0 plane until the CM reaches the desired ζ relative height. The mesh is simultaneously pushed downwards, keeping a minimum distance µ = 5 ˚ A between protein and mesh points, so to perfectly shape the BS like the protein surface without creating overlaps between the residues. To avoid the occurrence of large gaps within the BS, we perform an iterative smoothing procedure. We compute the distance of each bead of the

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mesh to all its neighbours on the xy square lattice. If the distance is ≥ rSA , we shift the z coordinate of the bead, so to fill the gap. We repeat the procedure for various ζ values, and for each of them we perform a rotational analysis to identify the orientation that maximises the BS surface area. We align the z axis with the minor inertia axis of the protein and we apply a discrete set of rotations around the perpendicular axes. We recompute the mould for each orientation, map the surface to a triangular mesh and evaluate its surface by summing the area of each triangle formed by all the triplets of the mesh at z 6= 0. It is important to notice that the surface area decreases with the increase of ζ (see Fig. 3). We choose the values ζ = 0.2; 0.4; 0.6; 0.8 obtaining binding sites with surface areas 1688, ˚2 , respectively. 1323, 984, and 649 A Among the mesh points at a distance of at least δ = 5 ˚ A from each other, we isolate a subset CSurf of homogeneously distributed points that will represent the BS residues. The value of δ = 5 is derived from the typical nearest distance between two residues in natural proteins (see Fig: FS2). At first, we include all beads of the mesh in the CSurf selection. Then, we loop over all the points, starting from a corner of the square lattice and for each CSurf point we deselect, i.e. exclude from CSurf , the beads within a sphere of radius δ. Beads that are not in CSurf selection are skipped. Lastly, we deselect also the beads at z = 0. The surface residues in the CSurf set interact with the protein residues according to Eq. (3) defined by the Caterpillar model. The binding site is considered frozen at all stages of our simulations, therefore we do not include for the CSurf residues the ability to form hydrogen bonds. The binding site-water interaction energy, defined in Eq. (4), requires the knowledge of the number of neighbours for each residue Ωi . However, in our representation the BS is made of a layer of amino acids rather than being part of a folded globular protein. This leads to neglecting the contribution to Ωi that arises from the parts of the protein that we are not explicitly representing. Therefore, we artificially add to the amino acids of the BS an offset number of neighbours that is compatible with amino acids exposed on a protein

8

ACS Paragon Plus Environment

Page 8 of 29

Page 9 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

a)

b) rOH O

H N

θ1 θ2

N H

C α

C O

C

Φ ψ

Figure 2: a) Representation of the caterpillar model: the purple circles centred on the Cα atoms represent the self-avoidance volume of the amino acid, which has a radius of 2 ˚ A. The backbone degrees of freedom are the torsional angles φ and ψ. In order to describe hydrogen bonds also the backbone amide (N H) and the carboxyl (CO) groups are explicitly considered. b) Pictorial representation of the binding site. Gray dots are self-avoiding beads; dark blue spheres are Cα atoms of the binding site; light blue spheres are the Cα atoms of the protein. The red spot represents the centre of mass of the protein and its height from the plane defined by the self-avoiding beads is zCM . rM AX is the maximum CM-Cα distance for the relative CM height. µ is the minimum and is used as normalisation factor ζ = rzMCM AX Cα protein-Cα binding site distance. δ is the minimum distance between two activated Cα of the binding site. surface in real proteins (see Fig. 4 for details about the offset evaluation).

Design Stage III) of Fig. 1 consist in the combined design of the protein-BS system. Artificial design consists in a vast sampling of the sequence space in order to identify the ensemble of sequences that will fold in a specific target structure. Although the CSurf residues are not disposed along a protein chain, we will refer to their chemical identity as BS sequence. We apply the Virtual Move Parallel Tempering Monte Carlo (VMPT) procedure presented in the works of Coluzza. 36,37 The target configuration is the folded bound protein, directly taken by the modelling procedure described above. The design is carried out considering protein and BS as a single object, with a single amino acid sequence, therefore optimising the 9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 29

˚, δ = 5 A ˚and different ζ. The red spot Figure 3: Binding sites constructed at µ = 5 A is located in the CM of the protein. The surface area falls in typical natural sizes (750 -1500 ˚ A2 1 ) for each system, and decreases with the increasing of ζ. binding interactions. According to a conventional Metropolis scheme, we perform single point mutations and amino acid swaps along the chain. We make use of the parallel tempering, simulating the same system at different temperatures, and swapping sequences between the replicas on the fly (acceptance probability in Eq. 8), thus enhancing the overcoming of energy barriers and, therefore, improving the sampling. In our implementation we simulate 16 replicas with a set of temperatures (10.000; 5.000; 2.000; 1.000; 0.500; 0.333; 0.250; 0.200; 0.167; 0.143; 0.125; 0.111; 0.100; 0.091; 0.083; 0.077) in units of kB . Moreover, at each temperature, we collect statistics using the information coming from all other replicas, according to the virtual move scheme described in Ref. 40 The best candidate sequences for the folding are the ones which minimise the total energy of the target structure among the ones maximising the total number of permutations Np , given by: Np =

N! q Q

.

(5)

ni !

i=1

In Eq. 5 q = 18 is the alphabet size (proline and cysteine are not included in the design, due to their peculiar role in protein structure, which is beyond the scope of our model); N is the 10

ACS Paragon Plus Environment

Page 11 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

total number of monomers in both proteins; ni is the total number of monomers of type i. The sampling is further enhanced through the use of an adaptive bias potential W [E, ln Np , T ], that is a function of the potential energy E and the number of permutations Np . The potential is constructed on the fly and biases the acceptance probabilities, so to lead the system to explore areas of the sequence space that have not been sampled yet. The acceptance probabilities for amino acid replacement, swap of amino acid identity along the chain and parallel tempering are, respectively:

rep Pacc = min{1, exp [∆W − (∆E − Ep ln

Npnew )/kB T ]}, Npold

swap Pacc = min{1, exp [∆W − ∆E/kB T ]}, T swap Pacc

where ∆W , ∆E and ln

Npnew = min{1, exp [∆W + (∆E − Ep ln old )/kB ∆T ]}, Np

Npnew Npold

(6)

(7) (8)

are bias potential, energy and number of permutation differences

between the new and old states; Ep = 20 kB T is an effective temperature that forces the system towards sequences highly heterogeneous (we refer to Ref. 36 for further details).

Folding As for the design, the strategy adopted for the folding simulations is inspired by the one described in. 36,37 We perform two types of folding simulations: single-protein folding, where we perform geometrical deformations to the structure of the artificially designed protein G so to identify its most stable conformation; protein-BS folding, consisting of deformations of the artificial protein in presence of a BS frozen in the simulation box, to test whether the protein folding is affected by the BS and the binding affinity of the pair. In the single protein folding, we straight apply the VMPT with adaptive umbrella sampling already employed for the refolding of several natural and artificial proteins, 36,37 including protein G. The set of temperatures used is T=(2.0; 1.8; 1.6; 1.4; 1.3; 1.2; 1.1; 1.0; 0.9; 0.8;

11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 29

0.7; 0.65; 0.6; 0.55; 0.5; 0.45), that ensures communication between the replicas and spans an interval that allows us to observe frequent folding-unfolding events. The simulations start with a fully stretched protein, the conformations of which are sampled through crankshaft and pivot moves. The acceptance probability for each geometrical move is generically:

Pacc = min{1, exp [∆W − ∆E/kB T ]},

(9)

where ∆W and ∆E are the bias potential and energy differences. The bias for the adaptive umbrella sampling W [DRM SD, HB] is a function of the distance root mean square displacement DRM SD (see description below, Eq. 10) and the number of hydrogen bonds HB, that counts the oxygen-hydrogen pairs that are closer than 2.5 ˚ A, are oriented towards each others (angular dependent pre-factor in Eq. 2 6= 0)and are further apart than second neighbours along the chain. Regarding the protein-BS simulations, we make use of a cubic box (box side ∼ 340 ˚ A), with two replicas of the BS symmetrically placed, as shown in Fig. FS3. The BSs are the mirror image of each other. The protein cannot enter the region between the two replicas of the BSs. As the caterpillar model is chirally invariant, we populate both target enantiomers equally, and each will bind to its corresponding binding site. We start from the fully stretched protein, keeping the BSs frozen and performing geometrical moves to the protein. In addition to crankshaft and pivot, we also perform global rotations and translations on the protein to let it stiffly reorient and diffuse in the box, and mirroring moves to switch from left to right handed configurations, since the caterpillar model is not taking into account the Cβ chirality. We simulate 32 replicas of the system, differing in temperature. The temperature range is T=(4.9; 4.6; 4.3; 3.9; 3.5; 3.1; 2.8; 2.55; 2.35; 2.2; 2.05; 1.9; 1.75; 1.6; 1.45; 1.3; 1.1; 1.0; 0.98; 0.95; 0.92; 0.88; 0.85; 0.82; 0.79; 0.76; 0.73; 0.70; 0.67; 0.63; 0.59; 0.55), that ensures multiple binding-unbinding events.

12

ACS Paragon Plus Environment

Page 13 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The order parameters for the adaptive bias potential are the DRM SDintra intra-protein and the DRM SDinter protein-BS. Being DRM SDintra an order parameters that monitors single-protein folding, it is the same one used in the single-protein simulations described above. DRM SD measures the distance of a structure from a target one and is defined as follows: s DRM SD =

1 X (|∆r~ij | − |∆r~ij T |)2 C ij

(10)

where the sum runs on the i − j pairs in contact in the target structure (namely native contacts, cut off 17 ˚ A, as in Ref. 36 ), ∆r~ij is the distance between the residues i and j belonging to same (DRM SDintra ) or different (DRM SDinter ) proteins, and ∆r~ij T is the same distance calculated over the target structure. C is a normalisation factor, equal to the number of native contacts for DRM SDinter and to twice the number of native contacts for DRM SDintra . The DRM SDinter is always evaluated with respect to the binding site replica closest to the protein. We monitor the advancement in the simulations by projecting the configurational space over the DRM SDintra and the DRM SDinter collective variables. Previous works 36,37 state that stable folded configurations correspond to a global free energy minimum with DRM SDintra ≤ 2 ˚ A. From the free energy landscapes, plotted as a function of both DRM SDintra and DRM SDinter , we can estimate the relative stability of folded-bound (low values of intra and inter DRMSD), unfolded-bound (high intra and low inter), folded-unbound (low intra and high inter), and unfolded-unbound states (high intra and high inter). Defining Qb as the partition function of all protein conformations bound (with at least one protein-BS contact) and Qf as the partition function of protein free in bulk (i.e. in the box volume where no contact to the BS is possible), one can measure the stability of the protein-BS binding by means of the association constant Ka. 10 We computed the association constant as Ka = exp(−∆F/kB T )Vbox /n, where Vbox is the accessible volume of our simulation box, n is the number of binding sites and ∆F = −kB T ln(Qb /Qf ) is the binding free energy (see Fig. FS3 for details). 13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Results and discussion We construct four BS with different relative heights ζ = (0.20; 0.40; 0.60; 0.80) for our selected target protein G. In Fig. 3 we show the resulting surfaces of the binding sites where two main features are visible. Firstly, the size of the surface decreases with increasing value of ζ. Secondly, the roughness of the surface matches the profile of the protein closely, resulting into a pattern of steric repulsion that, as we will show later, are essential for the specificity of the BS. Each BS ζ = (0.20; 0.40; 0.60; 0.80) has a different number of residues 50, 40, 28, 17. For each binding site, we design both the residues on the BS and on the protein in the folded and bound configuration, so to optimise the interaction energy. The set of solutions produced by the design algorithm are substantially different for each system as demonstrated by comparing the ensembles of solution sequences (see Fig.FS1 in the Sequence solution basins of SI). In Fig. 4 we show how the binding site influences the solvent exposure patterns of the protein. The plots show that there is a significant influence even for the smallest BS at ζ = 0.80. Whether such an alteration is enough to encode for the specific binding is what we intend to prove next. From the basins of each of the designed systems, we select the sequence with higher permutations Np and lower potential energy E, which is the criterion employed in Ref. 36 to select the best candidate for folding. Firstly, we test the folding properties of the latter sequences in single-protein simulations, by monitoring the folding free energy as a function of the DRM SD to the native protein G structure (see Fig. 5). Overall, below a common folding temperature, all sequences fold and show a global free energy minimum below DRM SD = 2 ˚ A, a value typical of folded configurations. 37 The configurations in the global minimum are also shown in Fig. 5, compared to the protein G native structure, demonstrating that the design sequences refold in a target configuration with a precision of RM SD = (2.7; 3.3; 1.0; 1.9) ˚ A, for systems ζ = (0.20; 0.40; 0.60; 0.80) respectively. We assess the reduced folding temperature approximately at TF ∼ 1.3 for all scenarios by looking at the temperature at which the free energy shows two equal minima below and above 14

ACS Paragon Plus Environment

Page 14 of 29

Page 15 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 4: Water exposure profiles for different binding sites: number of contacts, i.e. neighbours, for each protein G residue. The orange horizontal line, Ω = 21, is the threshold on the number of neighbours above which an amino acid is considered buried into the protein core. The purple line accounts for protein-protein and protein-BS contacts, while the green line accounts for protein-protein contacts only. A mismatch in the two lines reveals a high impact of the BS presence on the water exposure profile for the corresponding protein residues. The colour scheme of the protein in the inset is related to the number of total contacts: solvent exposed regions in red, buried in blue. The exposure profile of protein G is used to evaluate an offset that corrects the number of neighbours of each BS residue (the blue spheres in the insets). We calculate the fraction of the volume of a sphere, centred on the BS Cα , that lies under the mesh surface and we multiply it by the average amino acid density in globular proteins (ρ = 0.011687 aa/˚ A3 , evaluated over a set of 145 globular proteins), thus obtaining the offset for the BS number of neighbours. We optimise the radius of the sphere in order to reach an average BS exposure similar to the one of the most exposed residues of protein G (corresponding to the minima of the green line in the plot), thus obtaining a sphere radius of 6 ˚ A.

15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

DRM SD = 2 ˚ A(see Fig.FS5 in SI).

Figure 5: Single-protein folding free energy profiles [F/kB T ] at reduced temperature 0.55 as a function of DRMSD [˚ A] from the native target structure (PDB ID: 1PGB). Different colours represent the folding free energy of the protein sequence obtained via the design procedure in the presence of the BS characterised by the ζ value specified in the key. Proteins are simulated in an empty box. Configurations corresponding to the free energy minimum for each system are represented in red, compared to the native protein G (in green).

Binding affinity We now move to the study of the binding of the protein to the artificial BS. In Fig. 6 we show the free energy profile as a function of the intra- and inter-DRM SD, to separate the contribution of the protein folding from the protein binding. Hence, bottom left corner of the landscape (small DRM SDintra and DRM SDinter ) corresponds to structures in the target folded and bound state; at the opposite corner, instead, (large DRM SDintra and DRM SDinter ) we find configurations totally unfolded and unbound. The free energy profiles demonstrate that the protein folds and binds correctly for all binding sites, remarkably even for the small ones. Fig. 7 shows the log10 (Ka ) as a function of the inverse temperature 1/T for the different 16

ACS Paragon Plus Environment

Page 16 of 29

Page 17 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

BSs. The first observation that we deduce from Fig. 7 is that the BS size has a major influence on the binding affinity, with larger sites presenting stronger binding compared to the smaller BSs. The representation in Fig. 7 offers the advantages of the van’t Hoff plot 41,42 determining if a process is enthalpic or entropic and allows to compare the values of the binding constant directly to typical experimental values. In fact, in a van’t Hoff plot ln(Keq ) = −∆H ◦ /(RT )+ ∆S ◦ /R, where the standard change in enthalpy ∆H ◦ is the slope, the standard change in entropy ∆S ◦ is the intercept and R is the ideal gas constant. Therefore, in Fig. 7 we can determine whether the process is endothermic or exothermic by assessing if the slope is smaller or larger than zero, and the corresponding entropic change from the intercept. We can distinguish different regimes at different values of 1/T . Starting from the highest values of 1/T (low T ) we find a positive slope consistent with an exothermic process, with negative entropy change consistent with the confinement of a folded protein in the correct position inside the binding site (as confirmed by Fig. FS8). The slope of the curve goes down with the surface area of the binding site and reaches its minimum for the surface ζ = 0.80. Approaching the unfolding temperature 1/TF ∼ 0.77 the process becomes entropy driven and the enthalpy contribution nearly vanishes. All the profiles indicate that upon binding, though folded, the protein experiences an increase in entropy. Since our simulations are performed in an implicit solvent and below TF the proteins are mostly folded, the increase in entropy is due to misbound states. It is important to stress that the translational entropy by definition is not included in the association constant Ka . Below 1/TF , all systems (more evident for ζ = 0.20) show a smooth shoulder corresponding to a new exothermic process caused by the simultaneous unbinding and unfolding of the protein: the larger the binding site surface the higher the temperature of the transition. The range of temperatures relevant for biological process is above the ambient temperature TA and below the folding one TF , for this reason we highlighted them with a red and a grey dashed line respectively in Fig. 7.

17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

˚2 ) in the biologically The binding site for ζ = 0.80 (corresponding to an area of 649 A relevant region has, close to the ambient temperature TA , a binding constant just above zero. Hence we deduce that ζ = 0.80 is the smallest BS that can still restrict the protein in the correct configuration, and represents an estimate of the lower size limit for suitable binding sites. Such an estimate is comparable to what as been observed experimentally so far where ˚2 1 the smallest surface area of natural BSs is 750 A Conversely for ζ = 0.40 and ζ = 0.20 we observe very large binding constants, Ka ≥ 1020 , all the way to the folding temperature. They fall out of the range observed in nature for protein-protein complexes 43 is 105 ≤ Ka ≤ 1014 . Such high values can be interpreted by taking into consideration that our design scheme is finalised to produce optimal binding, basing on the overall energy minimisation. In contrast to nature, our artificial evolution scheme select only for optimal binding, neglecting additional constraints due, for instance, to protein functionality. Moreover, we do not allow conformational changes of one of the involved partners, therefore neglecting its contribution to the binding affinity due to entropy loss upon binding. These aspects, e.g. additional constraints and entropy loss, would affect the binding constant, partially decreasing its value. Interestingly the largest surface areas observed in natural BSs is 1500˚ A2 1 and falls be˚2 and 1688 A ˚2 respectively. This tween the surfaces of ζ = 0.40 and ζ = 0.20 BSs, 1324 A result suggests that larger surfaces are not observed in nature as they are unlikely to offer any significant biological advantage and could potentially lead to irreversible binding. Another intriguing property is the stereoselectivity of the binding sites concerning the chirality of the protein. Since the two replicas of the binding site have opposite chirality, they will bind a different chiral folded state each (see Fig.FS4 in the SI). In combination with the impossibility for a folded protein to have mixed chirality, this result implies that even the smallest binding site can induce the collapse of the entire binding protein into a specific chiral state. Although natural proteins have a well defined chiral state imposed by the amino acids while the caterpillar model doesn’t, the result presented here shows a mechanism by

18

ACS Paragon Plus Environment

Page 18 of 29

Page 19 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

which a signal (e.g. the binding to a substrate) could propagate to an entire protein.

a

b

c

d

Figure 6: Folding free energy landscapes [F/kB T ] at reduced temperature 0.55 as a function of DRM SDintra protein [˚ A] and DRM SDinter protein-binding site [˚ A] from the native target structure (protein 1PGB bound to the binding site). Panels a to d refer to systems with ζ = [0.20; 0.40; 0.60; 0.80] respectively. Configurations corresponding to the free energy minimum for each system are represented in red, compared to the target bound protein G configuration (in green).

Binding specificity For most of the protein binding sites is required not only a high affinity for the target but also the specificity towards it. To test the specificity of our artificial binding sites, we considered two distinct scenarios: firstly, we examine the specificity of our binding site against proteins with identical structures but different sequences. Hence, we expose to the

19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7: Affinity log10 Ka (Ka [l/mol]) as a function of the inverse of reduced temperature 1/T for the investigated systems. The grey dashed line shows the inverse of the folding temperature 1/TF . The red dashed line is an estimate of the ambient temperature in reduced units. The colour scheme of the curves refers to the ζ values: purple, green, yellow and light blue correspond to ζ = (0.20; 0.40; 0.60; 0.80) respectively.

20

ACS Paragon Plus Environment

Page 20 of 29

Page 21 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

binding sites corresponding to ζ = 0.80 the protein designed bound to the surface at ζ = 0.20 and vice versa. The second scenario, instead, corresponds to testing the binding and folding of a protein different in structure but similar in size (albumin-binding GA module, PDB ID 1gab) to protein G. For this test we again consider the surfaces ζ = [0.20; 0.80]. In Fig. 8 we show the affinity of the binding evaluated on all the different scenarios. The results show that in all scenarios the proteins bind to the binding sites only at high temperature, where the affinity starts to be constant, as the chain now explores the conformational space of a swollen globule, and we recover the non-specific binding discussed for Fig.7. Decreasing the temperature, the binding affinity decreases, also above 1/TF , thus indicating that folded compact structures are not compatible with the binding and the proteins preferably bind in a misfolded state (also confirmed by Fig. FS7 SI). Moreover, since all the curves show the same trend, one can say that the binding site is inert towards random binding partners irrespectively of the protein sequence and structure. Although we did not reach a regime of temperature low enough, since there are conformation with small but non-zero binding energy, we speculate that the binding affinity should eventually increase to positive values showing an unusual re-entrant behaviour.

Conclusions The BSs significantly influence the sequence space of the target protein, and the strength of the influence seems correlated to the particular binding orientation of the protein. Nevertheless, all the folding free energy profiles in presence/absence of the BS reveal that all designed sequences can fold into the target structure both in the bulk and when bound to the binding site. Furthermore, by increasing the binding area, the protein preferentially binds to the binding site with increasing binding affinity. Hence, the results prove that we successfully designed the protein-BS pair for all investigated systems. Furthermore, we demonstrated the specificity of the BSs at all sizes. It is important to stress that the specificity came about spontaneously from the design process and did not 21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8: Affinity log10 Ka (Ka [l/mol]) as a function of the inverse of reduced temperature 1/T for the investigated systems. The grey dashed line shows the inverse of the folding temperature for designed protein G, while the green dashed line is the inverse of the folding temperature for protein 1gab (see Fig FS6 in SI).

22

ACS Paragon Plus Environment

Page 22 of 29

Page 23 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

require any refinement step. Hence, it seems that specificity is a direct consequence of the surface steric selection and the sequence optimisation protocol used. The BS with the smallest surface that we investigate is still capable of achieving specific binding. The binding constant is significant only close to ambient conditions, setting the lower bound to BS surfaces size of 649 ˚ A2 comparable to the experimentally observed of 750 ˚ A2 . 1 Conversely the BSs with larger surfaces present extremely high binding constant even close to the unfolding temperature. Hence, the BS that shows reasonable binding constant ˚2 close to the experimentally observed maximum values of 1500 A ˚2 . 1 has a surface of 1324 A Such observations have far-reaching implications in the understanding of the evolution of protein-protein interactions and could be used to guide the design of artificial novel BSs with desired binding affinity and specificity.

Acknowledgement All simulations presented in this paper were carried out on the Vienna Scientific Cluster (VSC). We acknowledge support from the VSC School, as well as from the Austrian Science Fund (FWF) project 26253-N27. V. B. acknowledges the support from FWF Grant No. M 2150-N36.

References (1) Arkin, M. R.; Wells, J. a. Small-molecule inhibitors of proteinprotein interactions: progressing towards the dream. Nat. Rev. Drug Discov. 2004, 3, 301–317, DOI: 10.1038/nrd1343. (2) Sotriffer, C. A.; Flader, W.; Winger, R. H.; Rode, B. M.; Liedl, K. R.; Varga, J. M. Automated docking of ligands to antibodies: methods and applications. Methods 2000, 20, 280–291.

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(3) Fahmy, A.; Wagner, G. TreeDock: a tool for protein docking based on minimizing van der Waals energies. J. Am. Chem. Soc. 2002, 124, 1241–1250. (4) Kastritis, P. L.; Bonvin, A. M. J. J. On the binding affinity of macromolecular interactions: daring to ask why proteins interact. J. R. Soc. Interface 2012, 10, 20120835– 20120835, DOI: 10.1098/rsif.2012.0835. (5) Den Boef, G.; Hulanicki, A. Recommendations for the usage of selective, selectivity and related terms in analytical chemistry. Pure Appl. Chem. 1983, 55, 553–556. (6) Coluzza, I.; Frenkel, D. Designing specificity of protein-substrate interactions. Phys. Rev. E 2004, 70, 51917, DOI: 10.1103/PhysRevE.70.051917. (7) Sear, R. P. Specific proteinprotein binding in many-component mixtures of proteins. Phys. Biol. 2004, 1, 53–60, DOI: 10.1088/1478-3967/1/2/001. (8) Sear, R. P. Highly specific proteinprotein interactions, evolution and negative design. Phys. Biol. 2004, 1, 166–172, DOI: 10.1088/1478-3967/1/3/004. (9) Deeds, E. J.; Ashenberg, O.; Gerardin, J.; Shakhnovich, E. I. Robust protein-protein interactions in crowded cellular environments. Proc. Natl. Acad. Sci. U. S. A. 2007, 104, 14952–14957. (10) Coluzza, I.; Frenkel, D. Monte Carlo study of substrate-induced folding and refolding of lattice proteins. Biophys. J. 2007, 92, 1150–6, DOI: 10.1529/biophysj.106.084236. (11) Jacobs, W. M.; Frenkel, D. Predicting phase behavior in multicomponent mixtures. J. Chem. Phys. 2013, 139, DOI: 10.1063/1.4812461. (12) Poma, A.; Turner, A. P. F.; Piletsky, S. A. Advances in the manufacture of MIP nanoparticles. Trends Biotechnol. 2010, 28, 629–637.

24

ACS Paragon Plus Environment

Page 24 of 29

Page 25 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(13) Piletska, E. V.; Guerreiro, A. R.; Whitcombe, M. J.; Piletsky, S. A. Influence of the polymerization conditions on the performance of molecularly imprinted polymers. Macromolecules 2009, 42, 4921–4928. (14) Ye, L.; Mosbach, K. Molecular imprinting: synthetic materials as substitutes for biological antibodies and receptors. Chem. Mater. 2008, 20, 859–868. (15) Alexander, C.; Andersson, H. S.; Andersson, L. I.; Ansell, R. J.; Kirsch, N.; Nicholls, I. A.; O’Mahony, J.; Whitcombe, M. J. Molecular imprinting science and technology: a survey of the literature for the years up to and including 2003. J. Mol. Recognit. 2006, 19, 106–180. (16) Yan, S.; Fang, Y.; Gao, Z. Quartz crystal microbalance for the determination of daminozide using molecularly imprinted polymers as recognition element. Biosens. Bioelectron. 2007, 22, 1087–1091. (17) Whitcombe, M. J.; Alexander, C.; Vulfson, E. N. Smart polymers for the food industry. Trends Food Sci. Technol. 1997, 8, 140–145. (18) Mosbach, K.; Ramstr¨om, O. The emerging technique of molecular imprinting and its future impact on biotechnology. Nat. Biotechnol. 1996, 14, 163–170. (19) Wulff, G.; Sarhan, A. Use of polymers with enzyme-analogous structures for resolution of racemates. Angew. Chemie-International Ed. 1972; p 341. (20) Takagishi, T.; Klotz, I. M. Macromoleculesmall molecule interactions; introduction of additional binding sites in polyethyleneimine by disulfide crosslinkages. Biopolymers 1972, 11, 483–491. (21) Shoemaker, B. A.; Portman, J. J.; WOLYNES, P. G. Speeding molecular recognition by using the folding funnel: The fly-casting mechanism. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 8868–+. 25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(22) Levy, Y.; WOLYNES, P. G.; Onuchic, J. N. Protein topology determines binding mechanism. Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 511–516, DOI: 10.1073/pnas.2534828100. (23) Kortemme, T.; Baker, D. Computational design of protein-protein interactions. Curr. Opin. Chem. Biol. 2004, 8, 91–97, DOI: 10.1016/j.cbpa.2003.12.008. (24) Steffen, C.; Thomas, K.; Huniar, U.; Hellweg, A.; Rubner, O.; Schroer, A. ProteinProtein Docking Dealing With the Unknown. J. Comput. Chem. 2010, 31, 317–342, DOI: 10.1002/jcc.21576. (25) Fuchs, J. E.; von Grafenstein, S.; Huber, R. G.; Wallnoefer, H. G.; Liedl, K. R. Specificity of a protein-protein interface: Local dynamics direct substrate recognition of effector caspases. Proteins Struct. Funct. Bioinforma. 2014, 82, 546–555, DOI: 10.1002/prot.24417. (26) Havranek, J. J.; Harbury, P. B. Automated design of specificity in molecular recognition. Nat. Struct. Biol. 2003, 10, 45–52, DOI: 10.1038/nsb877. (27) Ljubetiˇc, A.; Gradiˇsar, H.; Jerala, R. Advances in design of protein folds and assemblies. Curr. Opin. Chem. Biol. 2017, 40, 65–71, DOI: 10.1016/j.cbpa.2017.06.020. (28) Polizzi, N. F.; Wu, Y.; Lemmin, T.; Maxwell, A. M.; Zhang, S.-Q.; Rawson, J.; Beratan, D. N.; Therien, M. J.; DeGrado, W. F. De novo design of a hyperstable nonnatural proteinligand complex with sub-˚ A accuracy. Nat. Chem. 2017, 9, 1157–1164, DOI: 10.1038/nchem.2846. (29) Huang, P.-S.; Love, J. J.; Mayo, S. L. A de novo designed protein protein interface. Protein Sci. 2007, 16, 2770–2774, DOI: 10.1110/ps.073125207. (30) Sammond, D. W.; Bosch, D. E.; Butterfoss, G. L.; Purbeck, C.; MacHius, M.; Siderovski, D. P.; Kuhlman, B. Computational design of the sequence and struc26

ACS Paragon Plus Environment

Page 26 of 29

Page 27 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

ture of a protein-binding peptide. J. Am. Chem. Soc. 2011, 133, 4190–4192, DOI: 10.1021/ja110296z. (31) Huang, P.-S.; Boyken, S. E.; Baker, D. The coming of age of de novo protein design. Nature 2016, 537, 320–327, DOI: 10.1038/nature19946. (32) Liu, Z.; Dominy, B. N.; Shakhnovich, E. I. Structural mining: Self-consistent design on flexible protein-peptide docking and transferable binding affinity potential. J. Am. Chem. Soc. 2004, 126, 8515–8528, DOI: 10.1021/ja032018q. (33) Radhakrishnan, M. L.; Tidor, B. Specificity in molecular design: A physical framework for probing the determinants of binding specificity and promiscuity in a biological environment. J. Phys. Chem. B 2007, 111, 13419–13435, DOI: 10.1021/jp074285e. (34) Liu, Z.; Huang, Y. Advantages of proteins being disordered. Protein Sci. 2014, 23, 539–550, DOI: 10.1002/pro.2443. (35) Koehl, P.; Levitt, M. De novo protein design. I. In search of stability and specificity. J. Mol. Biol. 1999, 293, 1161–81, DOI: 10.1006/jmbi.1999.3211. (36) Coluzza, I. A Coarse-Grained approach to protein design: Learning from design to understand folding. PLoS One 2011, 6, e20853, DOI: 10.1371/journal.pone.0020853. (37) Coluzza, I. Transferable Coarse-Grained Potential for De Novo Protein Folding and Design. PLoS One 2014, 9, e112852, DOI: 10.1371/journal.pone.0112852. (38) Irb¨ack, A.; Sjunnesson, F.; Wallin, S. Three-helix-bundle protein in a Ramachandran model. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 13614–13618, DOI: 10.1073/pnas.240245297. (39) Richardson, J. S.; Richardson, D. C. In Predict. Protein Struct. Princ. Protein Conform.; Fasman, G. D., Ed.; Springer US: New York, 1989; p 798, DOI: 10.1007/978-1-4613-1571-1. 27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(40) Coluzza, I.; Frenkel, D. Virtual-move parallel tempering. Chemphyschem 2005, 6, 1779– 1783, DOI: 10.1002/cphc.200400629. (41) Lim, C. W.; Kim, T. W. Dynamic [2]Catenation of Pd(II) Self-assembled Macrocycles in Water. Chem. Lett. 2012, 41, 70–72, DOI: 10.1246/cl.2012.70. (42) Hino, S.; Ichikawa, T.; Kojima, Y. Thermodynamic properties of metal amides determined by ammonia pressure-composition isotherms. J. Chem. Thermodyn. 2010, 42, 140–143, DOI: 10.1016/j.jct.2009.07.024. (43) Kastritis, P. L.; Moal, I. H.; Hwang, H.; Weng, Z.; Bates, P. A.; Bonvin, A. M. J. J.; Janin, J. A structure-based benchmark for protein-protein binding affinity. Protein Sci. 2011, 20, 482–491, DOI: 10.1002/pro.580.

28

ACS Paragon Plus Environment

Page 28 of 29

Page 29 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Graphical TOC Entry

29

ACS Paragon Plus Environment