Coupling an Ensemble of Homologues Improves Refinement of

Oct 28, 2015 - Computational protein structure refinement: almost there, yet still so far to go. Michael Feig. Wiley Interdisciplinary Reviews: Comput...
1 downloads 13 Views 5MB Size
Subscriber access provided by UNIV OF CAMBRIDGE

Letter

Coupling an ensemble of homologs improves refinement of protein homology models Andre Wildberg, Dennis Della Corte, and Gunnar F. Schröder J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.5b00942 • Publication Date (Web): 28 Oct 2015 Downloaded from http://pubs.acs.org on November 2, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Coupling an ensemble of homologs improves

2

refinement of protein homology models

3 André Wildberg1†, Dennis Della Corte1 and Gunnar F. Schröder1,2*

4 5

1

Forschungszentrum Jülich, Institute of Complex Systems, ICS-6: Structural Biochemistry,

6

52425 Jülich, Germany, 2Physics Department, Heinrich-Heine Universität Düsseldorf, 40225

7

Düsseldorf, Germany

8 9 10

† current address: Department of Chemistry and Biochemistry, University of California, 9500 Gilman Drive, La Jolla, CA 92093, USA

11 12

*corresponding author

13 14

Abstract

15

Atomic models of proteins built by homology modeling or from low-resolution experimental

16

data may contain considerable local errors. The refinement success of molecular dynamics

17

simulations is usually limited by both force field accuracy and by the substantial width of the

18

conformational distribution at physiological temperatures. We propose a method to overcome

19

both these problems by coupling homologous replicas during a molecular dynamics simulation,

20

which narrows the conformational distribution, and smoothens and even improves the energy

21

landscape by adding evolutionary information.

22

1 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

The interpretation of genomics data in terms of protein structure is an important post genomic

3

challenge. Building atomic models for individual amino acid sequences becomes increasingly

4

important to understand the molecular effects of genetic variation. Homology modeling is a

5

useful tool to build atomic models if the structure of an homologous protein is known. However,

6

due to the limitation of current methodology such models of protein structures may contain

7

considerable errors. Similarly, atomic models built with low-resolution (e.g. from X-ray

8

diffraction or cryo-EM) or sparse experimental data might contain comparable errors.

9 10

Refinement approaches have the goal of correcting these errors in atomic protein models. The

11

types of errors we consider as being amenable to refinement include disrupted hydrogen bond

12

networks, small shifts of secondary structure elements, incorrect side-chain packing and

13

rotamers, and wrong loop conformations. Correcting such errors is typically challenging since

14

the energy differences between alternative, slightly different conformations are rather small. The

15

Critical Assessment of Structure Prediction (CASP) experiment has a refinement category to test

16

the performance of refinement methods 1,2.

17 18

Regular molecular dynamics (MD) simulations are generally unable to refine homology models

19

and do not consistently yield a structure that is moved closer to the native structure (as usually

20

determined by high-resolution X-ray crystallography) 3. Even though regular unrestrained MD

21

simulations can sample closer-to-native structures, reliably selecting these structures is not

22

possible 4.

23

2 ACS Paragon Plus Environment

Page 2 of 19

Page 3 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

The main causes for this limitation of MD simulations are 1) force field inaccuracies 5, 2) high

2

energy barriers that need to be crossed, and 3) the fact that the Boltzmann distribution, which is

3

approximated by the MD simulation, is broad at physiological temperatures. Simulation at

4

physiological temperatures is however necessary for proper contribution of the entropy; only

5

then is the free energy of conformational states correctly described.

6 7

Position restraints have been used successfully to prevent the simulation from exploring the

8

broad Boltzmann distribution; these restraints force the structure to sample a region around the

9

starting model, which also leads to sampling closer-to-native structures with higher probability 5-

10

7

11

which might hinder sufficient sampling and refinement.

. Position restraints however also set an upper limit to the extent of the conformational change,

12 13

The goal of structure refinement is to determine the most probable conformation, which

14

corresponds to the free energy minimum, rather than the conformational distribution. It has been

15

shown that the most probable conformation can be approximated by averaging the structures

16

from an MD ensemble more robustly than by selecting a single structure with a scoring

17

function6,7. However, for the averaging to yield a good structure requires the simulation to

18

predominantly sample near native structures.

19 20 21

We here present in two steps a modified MD protocol that addresses all three problems of regular

22

MD simulations mentioned above (force field inaccuracies, high energy barriers, broad

23

Boltzmann distribution).

3 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

In the first modification step, we simulate simultaneously eight identical replicas of the starting

3

structure. These replicas are subjected to the same harmonic position restraints (on Cα-atoms),

4

which forces them to remain similar to each other. The positions of the restraints are constantly

5

updated during the simulation and slowly follow the motion of the center of mass of all replicas.

6

These adaptive restraints were inspired by deformable elastic network restraints (DEN), which

7

have been shown to guide structure refinement against X-ray diffraction and cryo-EM data 8-10.

8

Since the restraints are adaptive, the coupled replicas are allowed to undergo any conformational

9

motion as long as they stay close together. The number of replicas has been chosen here simply

10

to limit the computational expense and it is not yet clear how the results depend on the number of

11

replicas.

12 13

The harmonic restraints restrict larger motions more than smaller motions, which leads to a time-

14

scale dependent diffusion coefficient (cf. Supplementary Fig. 1). For small time-scales the size

15

of the diffusion coefficient is comparable to that of free MD simulations, which enables

16

individual replicas to cross local energy barriers. In addition, entropic contributions of solvent

17

and side-chains (which are not restrained) are not strongly affected, which means that in

18

particular the solvation free energy is mostly unperturbed. For longer time-scales the diffusion

19

coefficient decreases significantly, which reduces large conformational fluctuations. Smaller

20

fluctuations mean that the system of coupled replicas is less likely to drift in random directions

21

and will sample low free energy states more frequently than a free MD simulation. Furthermore,

22

the coupling of replicas has an effect of smoothening the energy landscape, similar to particle

23

swarm optimization, which has been applied to MD simulations before 11. The motion of the

4 ACS Paragon Plus Environment

Page 4 of 19

Page 5 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

center of mass is the result of an effective force averaged over all replicas. Because the replicas

2

are in different positions on the energy landscape the center of mass moves on a locally

3

averaged, i.e. smoothened energy landscape (see Methods).

4 5

In the second modification step, the target sequence in seven of the replicas is replaced with

6

homologous sequences. This is motivated by the observation that structure is much more

7

conserved than sequence which causes homologous proteins to fold into similar structures 12.

8

This fact can be exploited by coupling homologous proteins (with pairwise sequence identity of

9

at least 50%) instead of identical replicas during a MD simulation. Keasar et al. 13-15 have

10

proposed that such a coupling of homologous proteins with slightly different energy landscapes

11

results in an energy landscape that is smoothened not only in structure space but also in sequence

12

space. The methodology was implemented in the GROMACS 4.5.3 16 software (see Online

13

Methods).

14 15

Our method has been tested successfully in the recent CASP11 experiment and was among the

16

best performing methods17. Of all 37 refinement targets 65 % could be improved. For the

17

improved models the average GDT-HA score was increased by 6.6. For those models that could

18

not be improved the average GDT-HA score decreased by 3.9. To understand why and how our

19

approach works we studied here the refinement of 5 repesentative homology models in more

20

detail. To avoid any bias we selected 5 models from a publicly available set of decoys that was

21

not generated by us (the Badretdinov decoy set 18, see Supplementary Table 1).

22

5 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Each model was simulated 5 times for 10 ns each. We compared three different simulation

2

protocols: 1) a free MD simulation of the homology model, 2) coupled identical replicas, and 3)

3

coupled homologous replicas.

4 5

Figure 1 shows for each test case average and standard deviation of distance root-mean square

6

(dRMSD) values from 5 independent simulations. The dRMSD is used to measure the deviation

7

of two atomic models. It is calculated as the root mean square deviation of corresponding pairs

8

of Cα-atom distances in two structures. All possible pairs of Cα-atoms are considered.

9 10

Simulations with coupled replicas (with both homologous (Fig. 1, first column) and identical

11

replicas (Fig. 1, second column)) show clear improvement over regular free MD simulations

12

(Fig. 1, third column). In free MD simulations the structure drifted away from the correct

13

structure in all cases except for 1hdn-1ptf, where a small number of frames was improved. In

14

simulations with identical replicas 51 % of the frames were improved on average, but the

15

improvements are small fluctuations around the null refinement (horizontal red line). In contrast,

16

simulations with homologous replicas consistently improve the structure and sample

17

conformations closer to the native structure most of the time.

18 19

The improvement of the structures from the different refinement protocols is quantified by the

20

change in dRMSD of the average structure from each trajectory, compared to that of the starting

21

structure (see Fig. 2). Free MD simulations led in all 5 cases to the lowest structural quality.

22

Comparison of the average improvements (Fig. 2, dashed lines) shows an offset between free

23

MD simulation and coupling of identical replicas, which can be attributed to the particle swarm

6 ACS Paragon Plus Environment

Page 6 of 19

Page 7 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

optimization effect. More importantly another offset can be seen between coupling with

2

identical and with homologous replicas, which represents the improvement that is due to the

3

added evolutionary information and which we interpret as an improvement of the force field.

4

Only for 1lpt-1mzl, which has the lowest starting quality of 3.8 Å RMSD, the average dRMSD

5

was not improved (see Fig. 1 c.1), however, the dRMSD of the average structure was slightly

6

improved (Fig. 2), clearly showing the benefit of structural averaging6.

7 8

Figure 3 compares the result of the simulation with coupled homologous replicas (green) with

9

the starting model (purple) and the native target structure (blue). Several secondary structure

10

elements and loop regions are shifted towards the correct structure. In contrast to free MD

11

simulations, the coupled-replica simulations yielded very consistent results: the average

12

structures from 5 independent simulations are very similar, as visualized in Supplementary Fig. 2

13

by multi-dimensional scaling 19.

14 15

We found that refinement with coupled homologous replicas outperforms regular MD

16

simulations in all test cases. The additional evolutionary information and the reduction in global

17

fluctuations through coupling of homologous replicas leads to consistently sampling structures

18

closer to their native state compared with free MD simulations. This insight will help to develop

19

even more powerful refinement methods based on MD.

20 21 22 23

7 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

METHODS

2

Implementation of replica coupling in GROMACS 4.5.3. For the replica-coupled

3

simulations, the simulation box was composed of 8 replicas, which are positioned at the edges of

4

a cube. The distance between the replicas needs to be large enough to avoid electrostatic

5

interactions between the replicas.

6

The replicas are coupled through adaptive position restraints on all Cα-atoms. GROMACS does

7

not by default support dynamic updates of position restraints during a simulation. We

8

implemented the necessary changes into the source code of Gromacs 4.5.3 16 to enable updates

9

without reducing the speed of GROMACS. We implemented the changes only for domain

10

decomposition runs (in source code file domdec.c). For each Cα-atom i in each replica j a

11

position restraint , is defined on its initial position. The total energy is then the sum of the

12

energy of the eight uncoupled (solvated) replicas and the time dependent energy term for the

13

position restraints, Eposre, given by

14 





posre ( ) =   , ( ) − , ( ) (1)  

15 16

with the coordinates , of Cα-atom i in replica j, the number of atoms N, and the number of

17

replicas M which we chose to be 8. The force constant w was set to 100 kJ/(mol nm2). After a

18

period, n, of 500 steps the position restraints are updated according to:

19 , ( +  Δ ) = , ( ) +  〈, ( ) − , ( )〉 (2) 20

8 ACS Paragon Plus Environment

Page 8 of 19

Page 9 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

with the integration timestep ∆t of 2 fs. The relaxation rate κ at which the position restraints

2

follow the average coordinate displacement was set to 0.5. The same displacement vector

3

〈, ( ) − , ( )〉 , which is an average over the corresponding displacements in all replicas j,

4

is added to all replicas, which leads to a coupling of the replicas. These adaptive restraints were

5

inspired by deformable elastic network (DEN) restraints, which yield a similar effect for a γ-

6

value of 1. The original DEN method employs a network of (also long) distance restraints,

7

which cannot efficiently be parallelized with domain decomposition. We therefore decided to use

8

adaptive position restraints.

9 10

For identical replicas the assignement of corresponding atoms in different replicas is trivial.

11

However, in case of homologous replicas, a multisequence alignment is performed to assign each

12

Cα-atom from the starting sequence to the corresponding Cα-atoms in the homologs. If there are

13

no gaps or insertions the assignment is again trivial. If the alignment shows a gap for k sequences

14

at a certain amino acid position, then position restraints are applied and averaged only for the

15

remaining (8-k) residues that are present at this position, which means that the displacement

16

vector will be averaged over (8-k) replicas. Insertions will not generate extra position restraints;

17

those residues instead are kept unrestrained and are free to move. The total number of position

18

restraints is therefore always identical to the number of Cα-atoms in the target sequence.

19 20 21

Method availability. The modified GROMACS version with adaptive position restraints to

22

couple multiple replicas is available from the SimTK website: http://simtk.org/home/adpt-

23

gromacs .

9 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

MD Protocols. All simulations used the AMBER99SB-ILDN force field with TIP3P explicit

3

water with an integration time step of 2 fs. Temperature was kept constant at 300 K by the Nosè-

4

Hoover algorithm. Electrostatic long-range interactions were calculated with PME and bond-

5

lengths were constrained by the P-LINCS approach. Na+/Cl- ions were added at physiological

6

concentration. Before and after adding the solvent molecules, the structure was energy

7

minimized to remove any sterical clashes that may be the result of homology model building.

8

Each simulation was repeated 5 times with duration of 10 ns. For comparison we performed

9

three different simulation protocols: 1) free MD simulation of a single protein structure, 2)

10

replica-coupled simulation with identical sequences, and 3) replica-coupled simulations with

11

homologous sequences.

12

The computational expense for the replica-coupled simulations is much larger than for the single

13

MD simulations, since the simulation system is eight times larger. The total amount of

14

simulation time equals a single protein simulation in solvent for 4.25 µs.

15 16 17

Sequence Selection Strategy. To build the homologous replicas, seven homologous sequences

18

were searched via BLAST 20 on the RefSeq 21 database. Sequences were manually selected that

19

fulfilled two criteria: 1) their sequence identities with the target sequence needs to be between 50

20

and 80 %, and 2) the sequence identities between all pairs of the 8 sequences should ideally also

21

be in the range 50–80 %. However, for some test cases the second criterion could not be strictly

22

fulfilled. The sequences chosen haven an average sequence identity of 61.8 % to the target

10 ACS Paragon Plus Environment

Page 10 of 19

Page 11 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

structure and are shown in Supplementary Table 2. The homology models used as the replicas

2

were generated with MODELLERv9 22.

3 4

Test case selection strategy. Homology models from the Badretdinov decoy set18 were chosen

5

as test cases. We aimed to cover a wide range of protein properties, such as size, secondary

6

structure composition and shape. The five homology models that were selected represent

7

starting qualities between 2-4 Å RMSD to the solved crystal structure. The sequence lengths

8

vary between 70 and 220 amino acids. The details of the selected models are shown in

9

Supplementary Table 1. The naming scheme of the models is xxxxX-yyyy, where xxxx and

10

yyyy are the PDB IDs of the the template and the target, respectively, and X is the chain ID of

11

the target.

12 13

ACKNOWLEDGMENTS

14

The authors gratefully acknowledge the computing time granted on the supercomputer JUROPA

15

at Jülich Supercomputing Centre (JSC).

16 17

SUPPORTING INFORMATION

18

Mean square displacement and multidimensional scaling plots for different simulation protocols,

19

properties of test targets and homologous replicas. This information is available free of charge

20

via the Internet at http://pubs.acs.org/.

21

11 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

FIGURE CAPTIONS:

2 3

Figure 1: Comparison of different refinement protocols (1–3). Five test models (a–e) have been

4

simulated 5 times with three MD protocols: coupled homologous replicas (1,left column),

5

coupled identical replicas (2, middle column), and free MD simulation (3, right column).

6

dRMSD values were averaged over the 5 independent trajectories (black lines). The standard

7

deviation is shown in pink. dRMSD values below the null refinement line (red line) indicate

8

improved frames. The bottom row (f) shows the average and standard deviation over the 5 test

9

cases for each refinement protocol.

10 11

Figure 2: The average dRMSD for each test case and each refinement protocol is plotted as well

12

as the average over the five test cases (dotted lines). Coupling identical sequences leads to a

13

clear improvement over free MD simulations. Coupling of homologous replicas leads to a

14

consistent refinement of all 5 test cases. Interestingly, the main improvement of using

15

homologous versus identical replicas is observed for the two test cases (1dvrA-1ak2 and1utrA-

16

1utg), for which coupling of identical replicas was not succesful.

17 18

Figure 3: For each test case, the starting structure (purple) is shown together with the best

19

model from the simulations with coupled homologous replicas (green) and the native structure

20

(blue). Refined regions are highlighted by red arrows. Figures were made with Chimera 23.

21

12 ACS Paragon Plus Environment

Page 12 of 19

Page 13 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Chemical Theory and Computation

REFERENCES

2 3

1

MacCallum, J.L.; Pérez, A.; Schnieders, M.J.; Hua, L.; Jacobson, M.P., Dill, K.A.

4

Assessment of Protein Structure Refinement in CASP9. Proteins: Struct., Funct., Bioinf.

5

2011, 79, 74-90.

6

2

7 8

Refinement Category. Proteins: Struct., Funct., Bioinf. 2014, 82, 98-111. 3

9 10

Nugent, T.; Cozzetto, D.; Jones, D.T. Evaluation of Predictions in the CASP10 Model

Fan, H.; Mark, A.E. Refinement of Homology based Protein Structures by Molecular Dynamics Simulation Techniques. Protein Sci. 2004, 13, 211-220.

4

Zhu, J.; Fan, H.; Periole, X.; Honig, B.; Mark, A.E. Refining Homology Models by

11

Combining Replica exchange Molecular Dynamics and Statistical Potentials. Proteins:

12

Struct., Funct., Bioinf. 2008, 72, 1171-1188.

13

5

Raval, A.; Piana, S.; Eastwood, M.P.; Dror, R.O.; Shaw, D.E. Refinement of Protein

14

Structure Homology Models via Long, All-atom Molecular Dynamics Simulations.

15

Proteins 2012, 80, 2071-2079.

16

6

Mirjalili, V.; Feig, M. Protein Structure Refinement through Structure Selection and

17

Averaging from Molecular Dynamics Ensembles. J. Chem. Theory Comput. 2013, 9,

18

1294-1303.

19

7

Mirjalili, V.; Noyes, K.; Feig, M. Physics based Protein Structure Refinement through

20

Multiple Molecular Dynamics Trajectories and Structure Averaging. Proteins: Struct.,

21

Funct., Bioinf. 2014, 82, 196-207.

13 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

8

Schröder, G. F.; Brunger, A.T.; Levitt, M. Combining Efficient Conformational Sampling

2

with a Deformable Elastic Network Model Facilitates Structure Refinement at Low

3

Resolution. Structure 2007, 15, 1630-1641.

4

9

5 6

with Low-resolution Data. Nature 2010, 464, 1218-1222. 10

7 8

11

12

13

14

Keasar, C.; Tobi, D.; Elber, R.; Skolnick, J. Coupling the Folding of Homologous Proteins. Proc. Natl. Acad. Sci. USA 1998, 95, 5880-5883.

15

17 18

Keasar, C.; Elber, R.; Skolnick, J. Simultaneous and Coupled Energy Optimization of Homologous Proteins: A New Tool for Structure Prediction. Fold Des 1997, 2, 247-259.

15 16

Chothia, C.; Lesk, A.M. The Relation between the Divergence of Sequence and Structure in Proteins. EMBO J 1986, 5, 823-826.

13 14

Huber, T.; van Gunsteren, W.F. SWARM-MD: Searching Conformational Space by Cooperative Molecular Dynamics. J. Phys. Chem. A 1998, 102, 5937-5943.

11 12

Schröder, G.F.; Levitt, M.; Brunger, A.T. Deformable Elastic Network Refinement for Low-resolution Macromolecular Crystallography. Acta Cryst. 2014, D70, 2241-2255.

9 10

Schröder, G.F.; Levitt, M.; Brunger, A.T. Super-resolution Biomolecular Crystallography

Keasar, C.; Elber, R. Homology as a Tool in Optimization Problems: Structure Determination of 2D Heteropolymers. J. Phys. Chem. 1995, 99, 11550-11556.

16

Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M.R.;

19

Smith, J.C.; Kasson, P.M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: A

20

High-throughput and Highly Parallel Open Source Molecular Simulation Toolkit.

21

Bioinformatics 2013, 29(7), 845-854.

22 23

17

CASP11 Refinement Results, http://predictioncenter.org/casp11/zscores_final_refine.cgi (accessed May 16, 2015)

14 ACS Paragon Plus Environment

Page 14 of 19

Page 15 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Chemical Theory and Computation

18

2 3

Badretdinov, A. (1997). Sali Lab Decoy Sets for Model Assessment. ftp://salilab.org/pub/azat/models-a/ (accessed Jan 21, 2006).

19

MDSJ: Java Library for Multidimensional Scaling, Version 0.2. Algorithmics Group,

4

University of Konstanz, 2009. http://www.inf.uni- konstanz.de/algo/software/mdsj/

5

(accessed Jun 12, 2015)

6

20

Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman,

7

D.J. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search

8

Programs. Nucleic Acids Res. 1997, 25(17), 3389-3402.

9

21

Pruitt, K. D.; Tatusova, T.; Maglott, D.R. NCBI Reference Sequences (RefSeq): A

10

Curated Non-redundant Sequence Database of Genomes, Transcripts and Proteins.

11

Nucleic Acids Res. 2007, 35, D61-D65.

12

22

13 14

Fiser, A.; Sali, A. Modeller: Generation and Refinement of Homology-based Protein Structure Models. Methods Enzymol. 2003, 374, 461-491.

23

Pettersen, E.F.; Goddard, T.D.; Huang, C.C.; Couch, G.S.; Greenblatt, D.M.; Meng, E.C.;

15

Ferrin, T.E. UCSF Chimera — A Visualization System for Exploratory Research and

16

Analysis. J. Comput. Chem. 2004, 25, 1605-1612.

17 18 19 20

15 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4

TOC

5 6 7 8

16 ACS Paragon Plus Environment

Page 16 of 19

Page 17 of 19 restraints Journal of Chemicalrestraints Theory and Computation ensemble ensemble free MDsimulations simulation Replica restraints with with Free MD Replica restraints with with homologous sequences identical sequences homologous sequences of a single protein identical sequences 5

start dRMSD start dRMSD mean dRMSD mean dRMSD

a.1

a.2

a.3

1dvrA-1ak2

1dvrA-1ak2

b.1

b.2

b.3

1hdn-1ptf

1hdn-1ptf

1hdn-1ptf

c.1

c.2

c.3

1lpt-1mzl

1lpt-1mzl

1lpt-1mzl

d.1

d.2

d.3

1pod-1poa

1pod-1poa

1pod-1poa

e.1

e.2

e.3

1utrA-1utg

1utrA-1utg

1utrA-1utg

f.2

f.3

1dvrA-1ak2

+/-

dRMSD (Å)

dRMSD (Å)

dRMSD (Å)

dRMSD (Å)

dRMSD (Å)

1 2 4 3 4 3 5 6 2 7 8 9 5 10 11 4 12 13 3 14 15 2 16 17 5 18 19 4 20 21 3 22 23 2 24 25 26 5 27 28 4 29 30 3 31 32 2 33 34 5 35 36 4 37 38 3 39 40 2 41 42 431.4 441.2 45 1 0.8 460.6 470.4 480.2 49 0 50-0.2 -0.4 51 52 53 54

f.1

dRMSD (Å)

null refinement mean refinement +/- deviation standard mean refinement

0

2

4

6

time (ns)

8

ACS Paragon 0 2 4Plus Environment 6 8 10

10

time (ns)

0

2

4

6

time (ns)

8

10

Figure 2

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 19

coupling impact coupling + sequence impact

ACS Paragon Plus Environment

Page 19 of 19 1ak2

1 2 3 4 5 6 7 8 9 10 11 poa 12 13 14 15 16 17 18 19 20 21 22 23 24 25 utg 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1lpt-1mzl

Journal of Chemical Theory and Computation 1lpt-1mzl

Figure 3

1hdn-1ptf 1hdn-1ptf

1dvrA-1ak2 1dvrA-1ak2 15%

26%

74%

49%

51%

85% free

1pod-1poa 1pod-1poa

1utrA-1utg 1utrA-1utg 26%

74%

ACS Paragon Plus Environment