Systematic Comparison of Amber and Rosetta Energy Functions for

Sep 21, 2018 - Haldar, Comitani, Saladino, Woods, van der Kamp, Mulholland, and Gervasio. 0 (0),. Abstract: Drug–target binding kinetics has recentl...
1 downloads 0 Views 1MB Size
Subscriber access provided by University of South Dakota

Biomolecular Systems

Systematic comparison of Amber and Rosetta energy functions for protein structure evaluation. Aliza B Rubenstein, Kristin Blacklock, Hai Nguyen, David A. Case, and Sagar D Khare J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00303 • Publication Date (Web): 21 Sep 2018 Downloaded from http://pubs.acs.org on September 22, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Systematic comparison of Amber and Rosetta

2

energy functions for protein structure

3

evaluation.

4

Aliza B. Rubenstein1,2, Kristin Blacklock3,4, Hai Nguyen3,4†, David A. Case1,2,3,4*, Sagar

5

D. Khare1,2,3,4*

6

1

Computational Biology & Molecular Biophysics Program 2

Institute for Quantitative Biomedicine

7

8

9

10

3

Department of Chemistry and Chemical Biology 4

Center for Integrative Proteomics Research

Rutgers, The State University of New Jersey, Piscataway NJ 08854

11 12

KEYWORDS: Amber, ff14SBonlySC, Rosetta, talaris2014, REF2015, comparative

13

analysis, structure prediction, decoy discrimination, loop modeling

14

15 1 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

ABSTRACT

2

An accurate energy function is an essential component of biomolecular structural

3

modeling and design. The comparison of differently derived energy functions enables

4

analysis of the strengths and weaknesses of each energy function, and provides

5

independent benchmarks for evaluating improvements within a given energy function.

6

We compared the molecular mechanics Amber empirical energy function to two

7

versions of the Rosetta energy function (talaris2014 and REF2015) in decoy

8

discrimination and loop modeling tests. In decoy discrimination tests, both Rosetta and

9

Amber (ff14SBonlySC) energy functions performed well in scoring the native state as

Page 2 of 38

10

the lowest energy conformation in many cases, but several false minima were found in

11

with both talaris2014 and Amber ff14SBonlySC scoring functions. The current default

12

version of the Rosetta energy function, REF2015, which is parameterized on both small

13

molecule and macromolecular benchmark sets to improve decoy discrimination,

14

performs significantly better than talaris2014, highlighting the improvements made to

15

the Rosetta scoring approach. There are no cases in Rosetta REF2015, and 8/140

16

cases in Amber, where a false minimum is found that is absent in the alternative

17

landscape. In loop modeling tests, Amber ff14SBonlySC and REF2015 perform

18

equivalently, although false minima are detected in several cases for both. The balance

19

between dihedral, electrostatic, solvation and hydrogen bonding scores contribute to the

20

existence of false minima. To take advantage of the semi-orthogonal nature of the

21

Rosetta and Amber energy functions, we developed a technique that combines Amber

22

and Rosetta conformational rankings to predict the most near-native model for a given

23

protein. This algorithm improves upon predictions from either energy function in 2 ACS Paragon Plus Environment

Page 3 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

isolation, and should aid in model selection for structure evaluation and loop modeling

2

tasks.

3 4 5

Introduction Computational protein structure prediction is dependent on an accurate energy

6

function. The native state of a protein is expected to be found uniquely at the minimum

7

of the energy function1; therefore, the energy function must robustly discriminate

8

between native and non-native conformations. A variety of energy functions to predict

9

protein structure have been implemented over the past forty years2–8. These potentials

10

largely fall into one of two categories: molecular mechanics force fields that rely on the

11

combination of various empirical potentials such as Lennard-Jones, torsional energies,

12

Coulombic interactions, and desolvation penalties3,4,7 and statistical or knowledge-

13

based potentials that depend on characteristics of known protein structures2,5,6. While

14

molecular mechanics force-fields are generally parameterized on small molecule

15

properties7,9–11, statistical potential parameter optimization is often guided by known

16

biomolecular structures12–14. Each approach has its own drawback: since parameters

17

in physically derived force-fields are fit based on small molecule properties, they may

18

not be suited to macromolecules15,16: for example, force-fields will often display biases

19

towards secondary structure propensities15,17. On the other hand, statistical potentials

20

are trained on specific datasets of large biomolecules, and data sparseness may lead to

21

overfitting18.

22

The Rosetta macromolecular modeling program energy functions combine elements

23

of both categories; they contains physical force-field terms (Lennard-Jones interactions, 3 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

electrostatic interactions, desolvation penalties, etc.) and statistical potentials

2

(probability of amino acid identity given backbone angles, probability of backbone

3

angles given amino acid identity, probability of backbone-dependent rotamer, etc.)19.

4

The most recent Rosetta energy function (REF2015) is parameterized on both small

5

molecule properties and large sets of biomolecular structures18, although previous

6

energy functions were generally parameterized for prediction of known biomolecular

7

structures and sequences13. While efforts have been made to compare the

8

performance of various empirical force-fields17,20,21, little attention has been focused on

9

the comparison between the Rosetta energy function and empirical force-fields used in

10 11

Page 4 of 38

molecular mechanics. The Amber ff14SBonlySC force field10 uses a standard fixed-charge molecular

12

mechanics potential, with torsion potentials based entirely on fits to quantum chemistry

13

data. It is very similar to the more commonly-used ff14SB protein force field, but does

14

not include the empirical modifications to backbone torsion potentials that are present in

15

ff14SB, and which provide an improved balance of secondary structure propensities in

16

explicit solvent simulations. Hence, ff14SBonlySC is more "physics-based" than is

17

ff14SB, and it arguably better suited for the implicit solvent energy evaluations used

18

here, since the empirical backbone torsional potentials in ff14SB might be specific to its

19

use with an explicit solvent representation. The ff14SBonlySC force field, in combination

20

with a generalized Born implicit solvent model, was parameterized using protein/peptide

21

folding22, has been shown to fold a variety of single-domain proteins using unrestrained

22

molecular dynamics simulations23.

4 ACS Paragon Plus Environment

Page 5 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Comparing the Amber force-field and Rosetta energy function performance at

2

structure evaluation elucidates the strengths and areas of improvements for each

3

energy function. As Rosetta energy functions have been developed based on improving

4

performance for certain modeling datasets, testing their performance on the same

5

macromolecular datasets may result in overfitting of the Rosetta energy function, while

6

comparing their performance to that of a (more) physics-based Amber energy function

7

is a relatively unbiased comparison for evaluating performance improvements.

8

Therefore, we chose two types of benchmark sets for energy function evaluation: first,

9

decoy discrimination benchmark sets that were used for the development of the Rosetta

10

REF2015 energy function18, and second, a loop modeling benchmark set from the

11

Kortemme group that were generated using Rosetta sampling but not utilized for

12

REF2015 development. Comparison of two Rosetta energy functions with the same

13

Amber energy function for these two benchmark sets provides insights into the

14

improvements made in the energy functions and highlights areas for further

15

improvement. Finally, selecting a correct near-native model for a given sequence is an

16

elementary challenge; the combination of these two semi-orthogonal energy functions

17

provides a method for model selection that is able to select more accurate models.

18

Methods

19

Benchmark Sets

20

To evaluate and compare the performance of Rosetta and Amber energy functions,

21

we used two benchmark sets, a structure evaluation (decoy discrimination) set and a

22

loop modeling set. The decoy discrimination benchmark set includes a total of 150

23

proteins, a combination of two independent decoy sets used in previous studies18,24. 5 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 38

1

The proteins in the set are monomeric and have crystallographic native structures

2

available in the RCSB PDB25 with resolution < 2.0 Å. The protein lengths range from 50

3

to 200 residues and have a diverse range of topologies. The decoy sets were originally

4

generated using biased and unbiased ab-initio sampling runs26 followed by parallel

5

loophash sampling (PLS)27. This produced 40,000-200,000 decoys per protein, ~1000

6

representative low-energy structures of which were chosen for each protein to cover the

7

range of possible C-α RMSD values.

8 9

The loop modeling benchmark set consisted of the 45-PDB dataset for 12-residue loops in the monomeric protein loops training set of the 2016 Collaborative Assessment

10

and Development of Rosetta Energetics and Sampling (CADRES) initiative in the

11

Rosetta community. This loop modeling benchmark set was obtained from Shane

12

O’Connor and Tanja Kortemme (personal communication).

13

Structure Preparation

14

Rosetta

15

In the decoy discrimination benchmark, the native crystallographic structures for each

16

protein set were downloaded from the RSCB PDB and residues were trimmed from the

17

structure to match the sequence of the crystal with the decoy structure in the

18

benchmark sets. Native structures were used to evaluate RMSD from native for decoy

19

conformations. Native structures were relaxed using FastRelax26 with the talaris201413

20

energy function to relieve any clashes. One hundred relaxation trajectories were

21

simulated to generate one hundred relaxed native-like decoys. These native-like decoys

22

were used for false minima analysis. Then, these one hundred native-like decoys,

23

along with the ~1000 pre-sampled decoys, were subjected to backbone and sidechain 6 ACS Paragon Plus Environment

Page 7 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

minimization using talaris2014 and the Limited-memory Broyden-Fletcher-Goldfarb-

2

Shanno (LBFGS) minimizer implementation with inexact line search conditions

3

(lbfgs_armijo_nonmonotone) over a maximum of 2000 iterations for convergence. C-α

4

atom RMSD was calculated for all decoys.

5

The REF2015 decoy dataset was obtained from F. DiMaio and H. Park18. For this

6

dataset, each decoy was relaxed with 3 cycles torsion-space minimization and 2 cycles

7

Cartesian mode24 using the REF2015 energy function19. Only 140 out of 150 protein

8

systems were included in this set due to the lower quality of experimentally determined

9

structures for 10 systems (H. Park and F. DiMaio, personal communication). Those 10

10 11

systems are ignored when comparing REF2015 to Amber. In the loop modeling benchmark, the native crystal structures for each protein set

12

were downloaded from the RCSB PDB and trimmed of residues that were not present in

13

the decoy PDB structures. The backbone and sidechain geometries for residues in the

14

loop region of each decoy structure were minimized in Rosetta using the talaris2014 or

15

REF2015 energy function and the lbfgs_armijo_nonmonotone over a maximum of 2000

16

iterations for convergence. C-α RMSDs were calculated with respect to the crystal

17

structure over loop residues only without fitting; since the protein scaffold was fixed

18

during optimization, this statistic describes the extent of loop deviation. Loop residues

19

are defined in supplementary file LoopDefs.xlsx.

20

Amber

21

Hydrogens were removed from the crystal structures and decoy PDBs, and initial

22

structures were built using the tLEaP module of AmberTools28 with the ff14SBonlySC10

23

forcefield parameters. Minimizations were carried out for a maximum of 1000 steps 7 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 38

1

under the LBFGS quasi-Newton algorithm29 with a convergence criterion of 0.01

2

kcal/mol-A. In the loop modeling benchmark, positional restraints were added to all non-

3

loop-residue atoms except for hydrogens with a force constant of 10.0 kcal mol-1 A-2.

4

Solvent effects were treated with a generalized Born implicit solvent model (GB-

5

Neck222) implemented in the Amber1628 package with mbondi3 radii and a cutoff value

6

of 999A for nonbonded interactions. Total potential energies of minimized structures

7

and C-α RMSDs with respect to the crystal structure were obtained using the pytraj

8

2.0.0 interactive molecular dynamics simulation data analysis Python package30, which

9

is a Python interface for cpptraj in AmberTools1628. In the loop modeling benchmark,

10

C-α RMSDs were calculated over the loop residues only. Six sets of decoy structures

11

for the loop modeling benchmark were unable to be minimized in Amber due to missing

12

residues, and those sets were not considered in subsequent analyses (PDB codes

13

1cb0, 1dts, 1m3s, 1ms9, 1t1d, and 2pia).

14

Energy Landscape Generation

15

Energy landscapes (RMSD vs. normalized energy scatterplots) were generated for all

16

proteins for both Rosetta and Amber. The ideal shape of an energy landscape is that of

17

a funnel (i.e. Figure 1A) where the lowest-scoring decoy conformations are of near-

18

native RMSD. We use the binned Boltzmann metric (see below) to evaluate the funnel

19

shape of each energy landscape.

20

Energy Normalization

21

For each set of energies per energy function per protein, energies are normalized so

22

that the gap between the 5th percentile of and the 95th percentile is equal to 1. This

8 ACS Paragon Plus Environment

Page 9 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

enables the comparison of energies between different structures and between different

2

energy functions. This is accomplished via the following equation:

3 4 5

Ei refers to the raw energy of decoy i. Emin is the minimum energy value, E95th is the 95th percentile energy, and E5th is the 5th percentile energy

6

Funnel Evaluation Metric

7

We use the binned Boltzmann metric, B, for energy landscape evaluation, as

8

described previously by Park et al.18. This metric finds the Boltzmann probability of

9

selecting native-like decoys over high-RMSD decoys based on their energy values. As

10

in previous work24, the metric is averaged over multiple thresholds for determining

11

native-like status for each decoy.

12 13

14

9 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 38

1 2

Figure 1. (A-C) Energy landscapes for 2QY7, 1T2I, and 1SEN respectively using Rosetta

3

talaris2014 (salmon; left), Rosetta REF2015 (purple; middle) and Amber ff14SBonlySC

4

(turquoise; right) energy functions. Each dot on the plot represents one decoy conformation.

5

The x-axis is RMSD from native crystal structure and the y-axis is normalized energy. False

6

minima (defined as decoys within top 10 energies but with RMSD > 5.0 Å) are depicted in black.

7

The B metric, which represents the efficacy of the energy function at differentiating between

8

native and non-native decoys, is shown at the top right corner of each plot. (D-F) Superimposed

9

native (gray) and Rosetta talaris2014 lowest-ranking false minimum decoy (salmon) and/or

10

superimposed native (gray) and Amber lowest-ranking false minimum decoy (turquoise) for

11

2QY7, 1T2I, and 1SEN respectively.

10 ACS Paragon Plus Environment

Page 11 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

The conformation index is i and j is the native threshold definition index. Cutoffs are

2

0.5, 1.0, 1.25, 1.5, 1.75, 2.0, 2.25, 2.5, 2.75, 3.0, 3.5, 4.0, 5.0, 6.0 Å and Nj is thus 14.

3

Ei(norm) is the score of decoy i as determined in Rosetta or Amber and normalized as

4

described above. The factor β refers to the Boltzmann factor and has a value of 0.1, as

5

this was the value of β used by Park et al. previously18. dij determines whether decoy i

6

is considered native at threshold j; it is set to 1 if it is native and 0 if it is not. As the sum

7

of the probabilities of the non-native-like conformations approaches 0, the numerator

8

(

9

approaches to 1. As mentioned in Park et al.18, the B metric is better than the

) approaches the value of the denominator (

), so that the value of B

10

previously used S metric24 due to a larger increase in the metric for a poor energy

11

landscape vs. a good energy landscape than the increase from an already good energy

12

landscape to a steeper energy landscape. Additionally, it is a smoother metric that is

13

less affected by single-decoy outliers.

14

Model Selection

15

For each protein, a model was selected by finding the decoy that had the lowest sum

16

of Amber and Rosetta ranks; this decoy also satisfies the criteria for Pareto-optimality.

17

First, the Amber scores and Rosetta scores were converted into ranks so that the rank

18

of decoy a was less than the rank of decoy b if the energy of decoy a was less than the

19

energy of decoy b. Second, the Pareto-optimal solutions are found as follows. Decoy a

20

is defined as dominating decoy b if both ranks (Rosetta and Amber) of decoy a are 1) decoys when performing consensus scoring according to the two

7

energy functions.

8

Results

9 10 11

Performance of Amber and Rosetta energy functions in discriminating between native and non-native structures Protein free energy landscapes involve folding funnels31–33 which enable the folding

12

chain to efficiently find the native state, and their existence implies that the higher

13

energy of non-native (decoy) structures compared to the native (e.g.,

14

crystallographically-determined) structure drives protein folding. Therefore, a common

15

test used for evaluating23,26 and improving18 energy functions is the decoy discrimination

16

test, in which the evaluated scores of decoy structures are compared to that of near-

17

native structures. High-RMSD decoys which have comparable energies to near-native

18

structures are classified as “false minima”, and are indicative of inaccuracies in the

19

energy function. The B metric18, ranging from 0 to 1, quantifies the existence of false

20

minima in a set of structures upon evaluation with a given energy function, with values

21

close to 1 indicating a smooth folding funnel with no false minima. Conversely, a lower

22

B value indicates that one or more false minima exist.

12 ACS Paragon Plus Environment

Page 13 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

We compared the performance of the Amber energy function and both of the Rosetta

2

energy functions at ranking native state structures lower than decoy conformations for a

3

set of 150 proteins (talaris2014 and Amber) or a set of 140 proteins (REF2015 and

4

Amber). Amber ff14SBonlySC generally performed better than Rosetta talaris2014,

5

scoring significantly higher B metrics for many systems (Figure 2A). We also compared

6

Amber to the default Rosetta energy function, REF201519, and found that while Amber

7

did have a higher B metric for several systems, several other systems had a higher B

8

metric when scored by REF2015 (Figure 2B), thus showing the improvement of

9

REF2015 over talaris2014 (Figure 2C) when compared to Amber as an unbiased

10

benchmark. The REF2015 energy function was developed to maximize the

11

discrimination between native and decoy structures using the same benchmark

12

datasets (among other tests), so it is not surprising that improvements over talaris2014

13

are observed. While many terms in the Amber ff14SBonlySC energy function were

14

developed using small molecule data, solvation parameters were fit to reproduce more

15

accurate Poisson-Boltzmann calculations for a range of peptides and proteins22. Thus,

16

this energy function, similar to REF2015, has been trained to discriminate decoy

17

structures from natives. While there is no overlap between the decoy discrimination

18

training sets used for ff14SBonlySC and REF2015, the performance of the two energy

19

functions in decoy discrimination tests is similar (Table 1).

13 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 38

1 2

Figure 2. Scatterplots to depict general performance of the scoring functions in decoy

3

discrimination: (A)Rosetta talaris2014 scoring function vs. Amber scoring function, (B) Rosetta

4

REF2015 scoring function vs. Amber scoring function, and (C) Rosetta REF2015 scoring

5

function vs. talaris2014 scoring function over the entire decoy discrimination set. Each dot

6

represents the B metric for one system. The black line is x=y and the dashed lines show the B-

7

values +/- 0.1 from the x=y line. Decoy sets discussed in more depth are annotated with the

8

PDB ID of that set.

9 10

To investigate the sources of misprediction and improvements in energy functions, we

11

examined cases in which either Amber, Rosetta, or both were unable to correctly rank

12

high-RMSD decoy conformations, scoring them as low-scoring instead of high-scoring.

13

A false minimum is defined as a decoy within the top-10 ranked decoys that has a C-α

14

RMSD from native of greater than 5 Å. Three of these cases are shown in Figure 1.

15

2QY7 (Figure 1A,D) has several false minima for Rosetta talaris2014 but none for

16

Amber. Generally, Rosetta talaris2014 alone had at least one false minimum in 16.0%

17

of structures, and Rosetta REF2015 alone had no false minima that were absent in

18

talaris2014 or Amber. 1T2I (Figure 1B,E) has a false minimum for Amber but none for 14 ACS Paragon Plus Environment

Page 15 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Rosetta talaris2014;1.3% and 5.7% of systems have at least one false minimum for

2

Amber alone when compared to talaris2014 and REF2015, respectively. 1SEN (Figure

3

1C,F) has false minima for both Amber and Rosetta talaris2014, as do 14% of overall

4

structures in the Rosetta talaris2014/Amber comparison, and 7.1% in the Rosetta

5

REF2015/Amber comparison (Table 1).

6 7

Table 1. B metric, false minima, and model selection summary comparisons for Amber

8

ff14SBonlySC, Rosetta talaris2014, and Rosetta REF2015 energy functions. No. Cases/Total No. Proteins Decoy Discrimination ff14SBonlySC B > talaris2014 B by 0.1 talaris2014 B > ff14SBonlySC B by 0.1 ff14SBonlySC B > REF2015 B by 0.1 REF2015 B > ff14SBonlySC B by 0.1 False minima in ff14SBonlySC only (not talaris2014) False minima in talaris2014 only (not ff14SBonlySC) False minima in ff14SBonlySC and talaris2014 False minima in ff14SBonlySC only (not REF2015) False minima in REF2015 only (not ff14SBonlySC) False minima in ff14SBonlySC and REF2015 Minimum-sum RMSD < ff14SBonlySC selected RMSD by 1.0 Å Minimum-sum RMSD < REF2015 selected RMSD by 1.0 Å Loop Modeling ff14SBonlySC B > talaris2014 B talaris2014 B > ff14SBonlySC B

54/150 0/150 6/140 9/140 2/150 24/150 21/150 8/140 0/140 10/140 14/140 13/140 15/39 7/39

9 10

We next investigated if any structural patterns and features can be discerned in the

11

landscapes where false minima are observed selectively for one energy function. As

12

REF2015 landscapes had no false minima that were not present in Amber landscapes,

13

we restricted our analyses to talaris2014 landscapes for these analyses. 15 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 38

1

Superimpositions of false minima decoys with native decoys show their distinct non-

2

native conformations involving both misprediction of secondary structure elements as

3

well as their incorrect relative placement in tertiary structures. In the case of 2QY7, a

4

Rosetta talaris2014 false minimum, the four-helix bundle found in the structure is

5

significantly perturbed in the false minimum as the order of the first two helices is

6

reversed; thus, while they do not contact the other two helices as tightly as that of the

7

native structure (Supplementary Figure 1E, I-J) their energies of interactions are found

8

to be favorable. The difference between the native structure of 1T2I and its Amber false

9

minimum is more subtle. While the contact maps for the native and false minimum

10

conformations are similar, except for a small contact region in the native structure

11

between residues 40 and 59 that does not appear in the false minimum (Supplementary

12

Figure 1K-L), the false minimum is slightly more compact and has a more ordered

13

secondary structure. Two beta sheet regions in the false minimum are beta

14

strands/unordered in the native structure and two alpha helices in the false minimum

15

are beta strands in the native structure (Supplementary Figure 1F-G).

16

The case of 1SEN, which has false minima for both Rosetta and Amber, is similar to

17

1T2I in that the false minima are more ordered than the native structure, although the

18

native structure forms more contacts between remote regions than do the false minima

19

(Supplementary Figure1A-D). Residues 85-96 form a tight beta hairpin in the false

20

minima, whereas the native residues 85-96 is a longer loop between the beta strands,

21

resulting in a shorter, less tight, beta hairpin. Additionally, the stretch of residues 94-109

22

in the native does not have any secondary structure, while that of the false minimum

23

begins as a beta strand and ends in an alpha helix (Supplementary Figure 1H). Decoys

16 ACS Paragon Plus Environment

Page 17 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

that are predicted as false minima often have the same overall structure and contact

2

maps as native structures, yet secondary structure differences may result in large

3

structural deviation. In some cases, false minima contain more ordered secondary

4

structures yet fewer contacts than native conformations; the propensity away from

5

disordered loops may result in lower energies for these false minima.

6 7

To investigate whether certain residues, structural elements, or energy terms

8

contribute more to false minima conformations, we analyzed the per-residue score

9

decomposition for Rosetta talaris2014 scores for the three systems outlined above

10

(2QY7, 1T2I, and 1SEN). We were unable to perform the same decomposition for

11

Amber as the GB solvation term is not pairwise-decomposable28. We calculated the Z-

12

scores for each residue over the lowest-scoring native and false minimum

13

conformations. We identified residues as possibly implicated in false minima if the false

14

minimum residue Z-score was lower than the native residue Z-score by at least one (i.e.

15

the distance between the two was greater than one standard deviation). We have

16

highlighted these residues (Figure 3A-C). False minima contributing residues were

17

distributed over the conformations and did not cluster to any particular region.

18

Moreover, false minima contributing residues were found in various types of secondary

19

structure: alpha helices, beta strands, and loops. It is therefore currently not possible to

20

attribute Rosetta false minima to any single per-residue propensity, but as expected,

21

several small errors in energy estimation may lead to the observed incorrect scoring.

17 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 38

1 2

Figure 3. Per-residue and per-score-term propensity of energy functions toward false minima.

3

(A-C) Native (gray) and Rosetta-talaris2014-minimized (salmon) structures of 2QY7, 1T2I, and

4

1SEN respectively. Rosetta-talaris2014-minimized residues that are scored by Rosetta

5

talaris2014 as greater than 1 standard deviation away from the corresponding native residue

6

are highlighted in red. Heatmaps of per-structure, score-term contribution to Rosetta-determined

18 ACS Paragon Plus Environment

Page 19 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

(D) and Amber-determined (E) false minima and true maxima. The row marked Overall shows

2

the percentage of structures that indicate some degree of implication for that score-term.

3

Per score-term contributions in Amber and Rosetta energy functions

4

We reasoned that insight about the performance and pathologies of each energy

5

function could be gained by identifying the energy terms that are responsible for correct

6

and incorrect evaluations within the same energy function. For example, we asked

7

which terms in the Amber energy function help it avoid mis-scoring a decoy (called

8

Amber true maximum) that is identified as a false minimum in the Rosetta talaris2014

9

landscape (called Rosetta false minimum), and vice versa.

10

We identified terms that contribute to false minima and true maxima by calculating the

11

Z-scores per decoy set and native set for each protein. If the lowest native score-term

12

Z-score is greater than the false minimum score-term by at least one, that term is

13

implicated in that false minimum. The reverse (i.e. true maximum score-term Z-score is

14

greater than the lowest native score-term Z-score by at least one) is true for identifying

15

true maximum contributing score-terms. The heatmap in Figure 3D depicts the fraction

16

of Rosetta talaris2014 false minima decoys (top) and true maxima decoys (bottom) that

17

show some degree of implication for each score-term. This is calculated both on a per-

18

protein basis and over the entire false minima/true maxima sets. Several score-terms,

19

including hbond_sr_bb, fa_dun, fa_rep and omega, are implicated in a majority of false

20

minima in the Rosetta talaris2014 energy function. A set of other score-terms contribute

21

to a majority of Rosetta true maxima (or Amber false minima), including rama,

22

hbond_bb_sc, hbond_sc, p_aa_pp, and fa_elec. These are score-terms that are not

23

usually implicated in Rosetta false minima, thus demonstrating that the score-terms that

24

contribute to the two trends (towards false minima and true maxima) are mutually 19 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 38

1

exclusive. Except fa_elec, the other terms identified as helping “rescue” Amber false

2

minima are all PDB-statistics derived, and it is not surprising that they are implicated in

3

correcting the errors of the more physics-based Amber energy function.

4

We next performed a similar analysis on Amber score-terms for both Amber false

5

minima and Amber true maxima (Figure 3E). We found that bond, angle, and gb are

6

responsible for more than 50% of Amber false minima and that dihedral and elec are

7

implicated in rescuing Rosetta false minima (Amber true maxima). We found that

8

score-terms that are responsible for false minima are not implicated in true maxima and

9

vice versa. Similar to the identification of statistically-derived terms in Rosetta as being

10

responsible for correctly scoring Amber false minima, we find that physics-based terms,

11

i.e., elec (which is counterbalanced by gb) and dihedral potentials, that are orthogonal

12

to the talaris2014 Rosetta energy function, are implicated in the rescue of Rosetta

13

talaris2014 false minima by Amber.

14 15

Combining rankings to select decoys improves decoy selection

16

Based on the results above indicating that use of one energy function can rescue the

17

false minima in the landscape generated by the other energy function due to additional

18

terms or different parameterization of terms, we sought to develop an approach to

19

productively combine the two landscapes for model selection. In model selection (for

20

example in protein structure prediction) the challenge is to select a near-native

21

conformation from a set of decoy conformations based on one or more energy values or

22

other features. Typically, an energy value obtained from a single energy function is

23

used. In the current benchmark set, if model selection is performed by Rosetta

20 ACS Paragon Plus Environment

Page 21 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

REF2015 and Amber ff14SBonlySC energy functions individually, the Rosetta lowest-

2

scored decoy has an RMSD of > 5.0 Å for three out of 140 systems, while the lowest-

3

scored Amber decoy has an RMSD of > 5.0 Å for four other systems. We designed a

4

minimum-sum based algorithm (see Methods) to select a decoy conformation based on

5

both sets of ranks to improve the chances of selecting a near-native decoy.

6

We found that our minimum-sum algorithm generally improved model selection for

7

both Rosetta and Amber rankings (Figure 4D-E). Each minimum-sum selected decoy

8

had an RMSD of < 5.0 Å; thus, the minimum-sum selected decoy rescued all seven

9

systems for which either the Rosetta or the Amber lowest-scored decoy had an RMSD

10

of > 5.0 Å. The minimum-sum selected decoy had an RMSD of at least 1 Å lower than

11

that of the lowest-scoring Rosetta decoy for 13 out of 140 cases and an RMSD of at

12

least 1 Å lower than that of the lowest-scoring Amber decoy for 14 out of 140 cases.

13

More generally, the minimum-sum selected decoy had a lower RMSD than that of the

14

Amber lowest-ranked decoy for 66/140 systems and a lower RMSD than that of the

15

Rosetta lowest-ranked decoy for an overlapping but different set of 66/140 systems.

21 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 38

1 2

Figure 4. Minimum-sum model selection. (A-C) Scatterplots of Rosetta-rank (REF2015) vs.

3

Amber-rank (ff14SBonlySC) for all decoys of 2QY7, 1T2I, and 1SEN respectively. Each point

4

represents one decoy conformation. The set of Pareto solutions is purple, the top-10 ranked

5

Amber decoys are turquoise, the top-10 ranked Rosetta decoys are salmon, and the minimum-

6

sum solution is black. Annotations represent the RMSD in Å from native for the top-ranked

7

Amber decoy (turquoise), top-ranked Rosetta decoy (salmon), and minimum-sum solution

8

(black). Scatterplots that show the efficacy of the minimum-sum solution at minimizing the

22 ACS Paragon Plus Environment

Page 23 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

distance from native relative to the top-ranked Rosetta decoy (D) and top-ranked Amber decoy

2

(E). Each point represents one system. The x-axis is the difference between the minimum-sum

3

solution RMSD from native and the RMSD of the minimum-RMSD decoy conformation, while

4

the y-axis is the difference between the Rosetta lowest-ranked conformation (D) or Amber

5

lowest-ranked conformation (E) RMSD from native and the RMSD of the minimum-RMSD decoy

6

conformation. Points that fall outside the 95% prediction intervals are annotated.

7 8

We examined the false minima cases described above (2QY7, 1T2I, and 1SEN) and

9

found that the minimum-sum decoy generally had lower or equal RMSD as compared to

10

those of Rosetta- or Amber-selected decoys (Figure 4A-C). However, for 1SEN, which

11

contains false minima for Rosetta talaris2014 and Amber ff14SBonlySC but not for

12

Rosetta REF2015, the Rosetta REF2015-selected decoy had a slightly lower RMSD

13

than that of the minimum-sum decoy (1.1 Å vs. 1.5 Å, respectively). Nevertheless, the

14

minimum-sum selected decoy RMSD is significantly lower than that of the Amber-

15

selected decoy (1.5 Å vs. 6.2 Å). Thus, a minimum-sum framework allows combining

16

the two energy functions productively to select a near-native model.

17 18

Loop Modeling

19

The conformational variability of loops plays a multi-functional role in protein structure

20

and function. They are implicated in stability and folding pathways34, binding and active

21

sites35,36, and binding other proteins37,38. Efficient sampling algorithms have been

22

developed36–39, but loop structure prediction efforts can be limited by energy functions,

23

as the energy gaps between loops are smaller and minima are narrow40. Therefore, we

24

tested both Rosetta energy functions and the Amber energy function on a loop modeling 23 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

benchmark (see Methods). In this benchmark, most of the structure remains the same

2

over the set of decoys; the difference lies in a small loop region, which can vary highly

3

in RMSD. The energy gaps between structures are therefore smaller; thus, loop

4

modeling provides a stringent test to distinguish between energy functions.

5

Page 24 of 38

We found that Amber ranked loop conformations considerably more accurately than

6

Rosetta talaris2014, as several systems had higher B values with Amber (Figure 5A,

7

left). REF2015 performance was more even: in 13 cases Amber B-values are

8

significantly (B>0.1) greater than REF2015, whereas the converse is true in 7 cases

9

(Fig. 5A; middle). The Rosetta REF2015 energy function also ranks loops more

10

accurately than the Rosetta talaris2014 energy function, although the improvement is

11

less dramatic (Figure 5A, right) compared to that observed for whole structure decoy

12

discrimination (Fig. 2C). Figure 5B depicts the energy landscapes for a particular case

13

where the Amber B-value was significantly higher than the Rosetta B-values (PDB

14

1CYO). The Amber funnel is steeper than either of the Rosetta funnels, which is

15

reflected in its higher B (0.86 vs. talaris2014 - 0.37, REF2015 – 0.19). We further

16

investigated the energetic differences between the lowest-RMSD and lowest-energy

17

decoys in the REF2015 landscape to elucidate which energy terms lead to this false

18

minimum. We examined two residues of interest, His15 and Asn17 (Fig 5C), which have

19

extensive hydrogen bonding networks in the native and near-native structures

20

compared to the low-energy decoy (Supplementary Figure 5A,B). REF2015 correctly

21

identifies the His15 conformation in the low-RMSD decoy to be of lower energy than the

22

false minimum decoy conformation, with more favorable hydrogen bonding scores

23

(hbond_sc, hbond_lr_bb, hbond_sr_bb) as well as electrostatic and solvation terms

24 ACS Paragon Plus Environment

Page 25 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

(fa_elec, fa_sol), as the histidine sidechain is pointed toward the solvent in low-RMSD

2

decoys and into the protein framework in the false minimum decoy. The energetic

3

preference for this residue in its native-like conformation, however, is reduced by a

4

penalization in the backbone dihedral (called rama_prepro in REF2015) term

5

(Supplementary Figure 5C). For Asn17, the native-like decoy has a slightly energetically

6

unfavorable conformation according to REF2015 that arises from a higher solvation

7

(fa_sol) and rama_prepro score, despite the presence favorable hydrogen bonding and

8

electrostatic terms (hbond_bb_sc, hbond_sr_bb, fa_elec, fa_atr) (Supplementary Figure

9

5D). Furthermore, the native-like loop for 1CYO scores more poorly compared to the

10

false-minimum decoy across all 12 residues in the rama_prepro term, the amino acid

11

probability (p_aa_pp), and sidechain energy (fa_dun) terms (Supplementary Figure

12

5E). Several pre-loop residues (five residues on the N-terminal side of the loop region)

13

also score poorly in the low-RMSD structure, mostly due to large differences in the

14

electrostatic (fa_elec) and hydrogen bonding terms (Supplementary Figure 5F). These

15

results suggest that improper balancing of the backbone and sidechain dihedral,

16

solvation and electrostatic/hydrogen bonding terms is responsible for the observed false

17

energy minimum in the Rosetta landscape for 1CYO.

18

We similarly examined the lowest-RMSD and lowest-energy decoys in the Amber

19

landscape to determine which ff14SBonlySC terms lead to the false minimum in the

20

case of PDB 1MSC: a case where Rosetta correctly discriminates against non-native

21

conformations but Amber does not. Comparing the differences between the energy

22

terms calculated over all residues for each structure, we found significantly unfavorable

23

changes in the electrostatic (elec) and van der Waals (vdw) terms in the low-RMSD

25 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 38

1

decoy (Supplementary Figure 6A) in Amber scoring. To specifically probe the

2

contributions of the loop region and its effect on these overall energy differences, we

3

compared the per-residue ff14SBonlySC energy decompositions for the low-RMSD

4

decoy to those of the false-minimum. Although the GB term is not pair-wise

5

decomposable over individual residues, comparing the differences for each of these

6

terms can allow for a qualitative comparison of the two loop regions. We find that,

7

although the majority of the near-native loop residues (eight out of 12) exhibit favorable

8

total energies in the low-RMSD decoy, two residues have significantly unfavorable

9

overall energies: Thr15 and Thr19. These residues and the C-terminal lobe of the loop

10

in the false-minimum decoy are positioned such that an alternative hydrogen bond

11

network is found between the loop and other portions of the protein (Supplementary

12

Figure 6B), which weakens the energetic advantage of the near-native loop

13

conformation. Furthermore, the false-minimum conformation allows for Thr15 to

14

participate in more hydrogen bonds than the near-native conformation, where it has no

15

hydrogen bond partners. This gives the false-minimum a considerably more favorable

16

electrostatics score for this residue (Supplementary Figure 6C). Similarly, Thr19 in the

17

false-minimum structure participates in more hydrogen bonds than in the low-RMSD

18

decoy, therefore allowing the false-minimum decoy to again have a more favorable

19

electrostatics score for this residue (Supplementary Figure 6D). Together, these results

20

suggest an imbalance between the solvation terms (gb term when comparing whole

21

decoys and polar solvation term in the per-residue decomposition) and electrostatic

22

terms are responsible for misprediction in this case.

26 ACS Paragon Plus Environment

Page 27 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1 2

Figure 5. Loop modeling benchmark. (A) General performance of Rosetta talaris2014 scoring

3

function vs. Amber scoring function (left), Rosetta REF2015 scoring function vs. Amber scoring

27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 38

1

function (middle), and Rosetta talaris2014 scoring function vs. Rosetta REF2015 scoring

2

function (right) over the entire loop modeling set. Each dot represents the B metric for one

3

system. The black line is x=y and the dashed line represents B-values that are +/- 0.1 from the

4

x=y line. Decoy sets discussed in more depth are annotated with the PDB ID of that set. (B)

5

Energy landscapes for 1CYO. Each dot on the plot represents one decoy conformation. The x-

6

axis is RMSD from native and the y-axis is normalized energy. The B metric, which represents

7

the efficacy of the energy function at differentiating between native and non-native decoys, is

8

shown at the top right corner of each plot. Rosetta talaris2014 and REF2015 plots are on the

9

left (salmon) and middle (purple), respectively, and the Amber plot is on the right (turquoise).

10

The lowest-energy decoy conformation in each plot is shown in green and the highest-energy

11

decoy conformation is shown in red. (C) Similar plot as (B) for 1MSC. The y-axis has been

12

truncated to show normalized energies less than 2.0. (D) Loop conformations for 1CYO in the

13

native structure (gray) and the lowest-energy decoys from REF2015 (purple) and Amber

14

(turquoise). (E) Loop conformations for 1MSC in the native structure (gray) and lowest-energy

15

decoys from REF2015 (purple) and Amber (turquoise).

16

Discussion

17

Systematic comparison of Amber ff14SBonlySC (a physically-derived energy function)

18

and Rosetta talaris2014 and Rosetta REF2015 (both physical and statistical based)

19

reveals the strengths and weaknesses of each energy function. Generally, Amber

20

ff14SBonlySC performs better than Rosetta talaris2014 at both decoy discrimination

21

and loop modeling. The comparison of Amber ff14SBonlySC to Rosetta REF2015 (the

22

current default Rosetta energy function) reveals that REF2015, which has more

23

physically-derived terms than talaris2014, performs comparably well or better in certain

24

cases when compared to Amber ff14SBonlySC. Examination of Rosetta energy

25

function score-terms that rescue Amber ff14SBonlySC false minima and Amber 28 ACS Paragon Plus Environment

Page 29 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

ff14SBonlySC score-terms that correct Rosetta false minima reveals two possible

2

sources for the performance improvement of REF2015: improved dihedral potential and

3

electrostatics model. While two of the Rosetta score-terms and two of the Amber score-

4

terms that contribute to the correction of false minima are counterparts of each other

5

(Amber dihedral and Rosetta rama, and Amber elec and Rosetta fa_elec), differences in

6

their derivation and parameterization appear to influence the propensity of each energy

7

function toward false minima. Although rama and dihedral both score the propensity of

8

the backbone dihedral angles, rama does so to fit dihedral distributions in native protein

9

structures while dihedral is based on fits to quantum chemistry data on small molecules.

10

Similarly, while both elec and fa_elec are derived from a Coulombic model, yet they are

11

differently parameterized; the Amber elec is parameterized via small-molecule

12

properties, whereas fa_elec is optimized on larger biomolecular structures. The

13

improvement of Rosetta REF2015 over Rosetta talaris2014 may be caused by its

14

greater inclusion of physical-derived terms (bond, angle, etc.) and/or its extensive

15

parameterization on both small-molecule properties (more typical of so-called “physics-

16

based” force fields) and larger biomolecular structures. Conversely, the training of

17

ff14SBonlySC solvation parameters22 using the computed energy landscapes of a

18

diverse set of proteins, which is more typical of so-called “knowledge-based” force field

19

development, may contribute to its success observed here. Thus, a promising direction

20

for force field development involves combining both small molecule and macromolecular

21

benchmarks, using fitting to diverse experimental and simulation data.

22 23

We note that the sampling method used for decoy generation and choice of benchmark sets also contribute to the apparent performance of a given energy function.

29 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 38

1

In our tests, all decoys were generated by extensive sampling using Rosetta

2

talaris2014, so the pathologies of this energy function are more likely to be highlighted.

3

REF2015 was developed to overcome these pathologies using the same decoy sets

4

(used either for training or evaluation), some pathologies remain as evidenced by loop

5

modeling tests (Fig. 5) which were not considered by Park et al.18 It will be interesting to

6

perform the converse experiment: extensive sampling with Amber ff14SBonlySC

7

followed by Rosetta scoring may reveal the pathologies of the former and aid in its

8

further development. Thus, iterative sampling and cross-scoring with differently derived

9

energy functions of sufficiently high quality may be a generally valuable approach for

10 11

force-field improvement. Model selection, or the ability to select a near-native decoy from a set of decoy

12

conformations is a general problem in protein structure prediction. If low-energy decoys

13

exist in false minima in the energy landscape, it is difficult to identify conformations that

14

are near-native. Since Amber and Rosetta provide different, semi-orthogonal

15

information, a framework to combine the two rankings enables more efficient

16

identification of near-native decoys. The minimum-sum based algorithm that we have

17

implemented improves model selection for 9% of structures over Rosetta REF2015

18

model selection and 10% of structures over Amber ff14SBonlySC model selection

19

(Figure 4, Table 1). The model selection algorithm is extensible to any two sets of

20

energy functions or model ranks for one set of models and can thus be used to combine

21

any two sources of information to produce meaningful improvements in near-native

22

decoy selection. Similarly, scores from three or more energy functions can be combined

23

using a Pareto optimality framework.

30 ACS Paragon Plus Environment

Page 31 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Chemical Theory and Computation

The approach described here should enable comparative analysis and combination of

2

future versions of both Amber and Rosetta scoring functions, and enable a variety of

3

biomolecular modeling tasks.

4 5

ASSOCIATED CONTENT

6

Supporting Information.

7

The following files are available free of charge.

8

LoopDefs: definitions for the loops in the loop modeling benchmark (xlsx)

9

Supplementary_Software: descriptions of software and scripts used in Methods (docx)

10

Figures: S1, False minima contact maps and structures; S2, Energy landscapes for all

11

decoy discrimination systems; S3, plot of minimum, minimum-sum, Rosetta, and Amber

12

RMSDs; S4, Energy landscapes for all loop modeling systems; S4, Comparison

13

between the lowest energy decoy and lowest RMSD decoy for 1CYO. Tables: S1, B-

14

metric values for Amber and Rosetta decoy discrimination systems; S2, values of

15

RMSD for minimum-sum-selected, Rosetta-selected, and Amber-selected models as

16

well as the actual lowest-RMSD value; S3, B-metric values for Amber and Rosetta loop

17

modeling systems (PDF)

18

AUTHOR INFORMATION

19

Corresponding Author

20

*Emails: [email protected], [email protected]

21

Present Addresses

22

†Schrodinger, Inc. 120 West 45th Street, 17th Floor, Tower 45, New York, NY 10036 31 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 38

1

Author Contributions

2

ABR, KMB, SDK and DAC wrote the manuscript. All authors have given approval to the

3

final version of the manuscript.

4

Funding Sources

5

This material is based upon work supported by RosettaCommons (grant to SDK and

6

DAC) and the National Science Foundation Graduate Research Fellowship under Grant

7

No. DGE-1433187 (ABR).

8

ACKNOWLEDGMENT

9

We thank F. DiMaio, H. Park for the REF2015 dataset and S. O’Connor and T.

10

Kortemme for the loop modeling dataset.

11

ABBREVIATIONS

12

RMSD, root-mean-square-deviation; REF2015, Rosetta Energy Function 2015; LBFGS,

13

Limited-memory Broyden-Fletcher-Goldfarb-Shanno; lbfgs_armijo_nonmonotone,

14

LBFGS minimizer implementation with inexact line search conditions; GB-neck2,

15

generalized Born implicit solvent model.

16 17

REFERENCES

18

(1)

1973, 181 (4096), 223–230.

19 20

Anfinsen, C. B. Principles That Govern the Folding of Protein Chains. Science

(2)

Lu, H.; Skolnick, J. A Distance-Dependent Atomic Knowledge-Based Potential for

21

Improved Protein Structure Selection. Proteins Struct. Funct. Genet. 2001, 44 (3),

22

223–232. 32 ACS Paragon Plus Environment

Page 33 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Chemical Theory and Computation

(3)

Brooks, B. R.; Brooks, C. L.; Mackerell, A. D.; Nilsson, L.; Petrella, R. J.; Roux, B.;

2

Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; et al. CHARMM: The

3

Biomolecular Simulation Program. J. Comput. Chem. 2009, 30 (10), 1545–1614.

4

(4)

Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.;

5

Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. A Second Generation

6

Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules.

7

J. Am. Chem. Soc. 1995, 117 (19), 5179–5197.

8

(5)

Current Opinion in Structural Biology. 1996, pp 195–209.

9 10

(6)

(7)

Ponder, J. W.; Case, D. A. Force Fields for Protein Simulations. Advances in Protein Chemistry. 2003, pp 27–85.

13 14

Shen, M.-Y.; Sali, A. Statistical Potential for Assessment and Prediction of Protein Structures. Protein Sci. 2006, 15 (11), 2507–2524.

11 12

Jernigan, R. L.; Bahar, I. Structure-Derived Potentials and Protein Simulations.

(8)

Simons, K. T.; Kooperberg, C.; Huang, E.; Baker, D. Assembly of Protein Tertiary

15

Structures from Fragments with Similar Local Sequences Using Simulated

16

Annealing and Bayesian Scoring Functions. J. Mol. Biol. 1997, 268 (1), 209–225.

17

(9)

Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. Development and Testing of

18

the OPLS All-Atom Force Field on Conformational Energetics and Properties of

19

Organic Liquids. J. Am. Chem. Soc. 1996, 118 (45), 11225–11236.

20

(10) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.;

21

Simmerling, C. Ff14SB: Improving the Accuracy of Protein Side Chain and

22

Backbone Parameters from Ff99SB. J. Chem. Theory Comput. 2015, 11 (8),

23

3696–3713.

33 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

(11) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.;

2

Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; et al. All-Atom Empirical

3

Potential for Molecular Modeling and Dynamics Studies of Proteins †. J. Phys.

4

Chem. B 1998, 102 (18), 3586–3616.

5

(12) Xu, D.; Zhang, Y. Ab Initio Protein Structure Assembly Using Continuous

6

Structure Fragments and Optimized Knowledge-Based Force Field. Proteins

7

Struct. Funct. Bioinforma. 2012, 80 (7), 1715–1735.

8 9

(13) O’Meara, M. J.; Leaver-Fay, A.; Tyka, M. D.; Stein, A.; Houlihan, K.; Dimaio, F.; Bradley, P.; Kortemme, T.; Baker, D.; Snoeyink, J.; et al. Combined Covalent-

10

Electrostatic Model of Hydrogen Bonding Improves Structure Prediction with

11

Rosetta. J. Chem. Theory Comput. 2015, 11 (2), 609–622.

12

(14) Shapovalov, M. V.; Dunbrack, R. L. A Smoothed Backbone-Dependent Rotamer

13

Library for Proteins Derived from Adaptive Kernel Density Estimates and

14

Regressions. Structure 2011, 19 (6), 844–858.

15

Page 34 of 38

(15) Best, R. B.; Zhu, X.; Shim, J.; Lopes, P. E. M.; Mittal, J.; Feig, M.; MacKerell, A.

16

D. Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting

17

Improved Sampling of the Backbone φ, ψ and Side-Chain χ(1) and Χ2 Dihedral

18

Angles. J. Chem. Theory Comput. 2012, 8 (9), 3257–3273.

19

(16) Raval, A.; Piana, S.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Refinement of

20

Protein Structure Homology Models via Long, All-Atom Molecular Dynamics

21

Simulations. Proteins Struct. Funct. Bioinforma. 2012, 80 (8), 2071–2079.

22 23

(17) Lindorff-Larsen, K.; Maragakis, P.; Piana, S.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Systematic Validation of Protein Force Fields against Experimental Data.

34 ACS Paragon Plus Environment

Page 35 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Journal of Chemical Theory and Computation

PLoS One 2012, 7 (2). (18) Park, H.; Bradley, P.; Greisen, P.; Liu, Y.; Mulligan, V. K.; Kim, D. E.; Baker, D.;

3

Dimaio, F. Simultaneous Optimization of Biomolecular Energy Functions on

4

Features from Small Molecules and Macromolecules. J. Chem. Theory Comput.

5

2016, 12 (12), 6201–6212.

6

(19) Alford, R. F.; Leaver-Fay, A.; Jeliazkov, J. R.; O’Meara, M. J.; DiMaio, F. P.; Park,

7

H.; Shapovalov, M. V.; Renfrew, P. D.; Mulligan, V. K.; Kappel, K.; et al. The

8

Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J.

9

Chem. Theory Comput. 2017, 13 (6), 3031–3048.

10

(20) Beauchamp, K. A.; Lin, Y. S.; Das, R.; Pande, V. S. Are Protein Force Fields

11

Getting Better? A Systematic Benchmark on 524 Diverse NMR Measurements. J.

12

Chem. Theory Comput. 2012, 8 (4), 1409–1414.

13

(21) Cino, E. A.; Choy, W. Y.; Karttunen, M. Comparison of Secondary Structure

14

Formation Using 10 Different Force Fields in Microsecond Molecular Dynamics

15

Simulations. J. Chem. Theory Comput. 2012, 8 (8), 2725–2740.

16

(22) Nguyen, H.; Roe, D. R.; Simmerling, C. Improved Generalized Born Solvent

17

Model Parameters for Protein Simulations. J. Chem. Theory Comput. 2013, 9 (4),

18

2020–2034.

19

(23) Nguyen, H.; Maier, J.; Huang, H.; Perrone, V.; Simmerling, C. Folding Simulations

20

for Proteins with Diverse Topologies Are Accessible in Days with a Physics-Based

21

Force Field and Implicit Solvent. J. Am. Chem. Soc. 2014, 136 (40), 13959–

22

13962.

23

(24) Conway, P.; Tyka, M. D.; DiMaio, F.; Konerding, D. E.; Baker, D. Relaxation of

35 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 38

1

Backbone Bond Geometry Improves Protein Energy Landscape Modeling. Protein

2

Sci. 2014, 23 (1), 47–55.

3

(25) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.;

4

Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000,

5

28, 235–242.

6

(26) Tyka, M. D.; Keedy, D. A.; Andre, I.; Dimaio, F.; Song, Y.; Richardson, D. C.;

7

Richardson, J. S.; Baker, D. Alternate States of Proteins Revealed by Detailed

8

Energy Landscape Mapping. J. Mol. Biol. 2011, 405 (2), 607–618.

9

(27) Tyka, M. D.; Jung, K.; Baker, D. Efficient Sampling of Protein Conformational

10

Space Using Fast Loop Building and Batch Minimization on Highly Parallel

11

Computers. J. Comput. Chem. 2012, 33 (31), 2483–2491.

12

(28) Case, D. A.; Betz, R. M.; Cerutti, D. S.; Cheatham III, T. E.; Darden, T. A.; Duke,

13

R. E.; Giese, T. J.; Gohlke, H.; Goetz, A. W.; Homeyer, N.; et al. AMBER 2016;

14

University of California: San Francisco, 2016.

15 16 17 18 19 20 21 22 23

(29) Liu, D. C.; Nocedal, J. On the Limited Memory BFGS Method for Large Scale Optimization. Math. Program. 1989, 45 (1–3), 503–528. (30) Nguyen, H.; Roe, D. R.; Swails, J. M.; Case, D. A. PYTRAJ: Interactive Data Analysis for Molecular Dynamics Simulations. Manuscr. Prep. 2017. (31) Dill, K. a; MacCallum, J. L. The Protein-Folding Problem, 50 Years On. Science 2012, 338 (6110), 1042–1046. (32) Wolynes, P. G.; Onuchic, J. N.; Thirumalai, D. Navigating the Folding Routes. Science 1995, 267 (5204), 1619–1620. (33) Shakhnovich, E. Protein Folding Thermodynamics and Dynamics: Where Physics,

36 ACS Paragon Plus Environment

Page 37 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Journal of Chemical Theory and Computation

Chemistry, and Biology Meet. Chemical Reviews. 2006, pp 1559–1588. (34) Xiang, Z.; Soto, C. S.; Honig, B. Evaluating Conformational Free Energies: The

3

Colony Energy and Its Application to the Problem of Loop Prediction. Proc. Natl.

4

Acad. Sci. U. S. A. 2002, 99, 7432–7437.

5 6 7

(35) Fiser, A.; Kinh Gian Do, R.; Sali. Modeling Loops in Protein Structures. Protein Sci. 2000, 9, 1753–1773. (36) Murphy, P. M.; Bolduc, J. M.; Gallaher, J. L.; Stoddard, B. L.; Baker, D. Alteration

8

of Enzyme Specificity by Computational Loop Remodeling and Design. Proc. Natl.

9

Acad. Sci. U. S. A. 2009, 106 (23), 9215–9220.

10 11

(37) Wang, C.; Bradley, P.; Baker, D. Protein-Protein Docking with Backbone Flexibility. J. Mol. Biol. 2007, 373 (2), 503–519.

12

(38) Mandell, D. J.; Coutsias, E. a; Kortemme, T. Sub-Angstrom Accuracy in Protein

13

Loop Reconstruction by Robotics-Inspired Conformational Sampling. Nat.

14

Methods 2009, 6 (8), 551–552.

15

(39) Ollikainen, N.; Smith, C. A.; Fraser, J. S.; Kortemme, T. Flexible Backbone

16

Sampling Methods to Model and Design Protein Alternative Conformations.

17

Methods Enzymol. 2013, 18 (9), 1199–1216.

18 19

(40) Stein, A.; Kortemme, T. Improvements to Robotics-Inspired Conformational Sampling in Rosetta. PLoS One 2013, 8 (5).

20 21 22 23

37 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 38

1 2 3 4

For Table of Contents Only

5

38 ACS Paragon Plus Environment