DCA-MOL - a PyMOL Plugin to Analyze Direct Evolutionary Couplings

4 days ago - Next Article ... evolutionary couplings and direct information to a particular set of molecules requires multiple steps and could be pron...
3 downloads 0 Views 4MB Size
Subscriber access provided by UNIV OF LOUISIANA

Application Note

DCA-MOL - a PyMOL Plugin to Analyze Direct Evolutionary Couplings Aleksandra Jarmolinska, Qin Zhou, Joanna Ida Sulkowska, and Faruck Morcos J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00690 • Publication Date (Web): 11 Jan 2019 Downloaded from http://pubs.acs.org on January 12, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

DCA-MOL - a PyMOL Plugin to Analyze Direct Evolutionary Couplings Aleksandra I. Jarmolinska,†,‡ Qin Zhou,¶ Joanna I Sulkowska,∗,†,§ and Faruck Morcos∗,k,¶ †Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland ‡College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, Banacha 2c, 02-097 Warsaw, Poland ¶Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA §Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland kCenter for Systems Biology,University of Texas at Dallas, Richardson, TX 75080, USA E-mail: [email protected]; [email protected]

Abstract Direct Coupling Analysis is a statistical modeling framework designed to uncover relevant molecular evolutionary relationships from biological sequences. Although DCA has been successfully used in several applications, mapping and visualizing of evolutionary couplings and direct information to a particular set of molecules requires multiple steps and could be prone to errors. DCA-MOL extends PyMOL functionality to allow users to interactively analyze and visualize coevolutionary residue-residue interactions between contact maps and structures. True positive rates for the top N pairs can be computed and visualized in real-time to evaluate the quality of residue-residue contact predictions. Different types of interactions in monomeric proteins, RNA, molecular

1

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

interfaces, protein conformational dynamics as well as multiple protein complexes can be studied efficiently within one application. DCA-MOL is available for download from http://dca-mol.cent.uw.edu.pl

Introduction Direct Coupling Analysis (DCA) refers 1,2 to a group of methods which use statistical modeling to quantify evolutionary relationships between specific positions within biological sequences (i.e. proteins and nucleic acids). The main idea behind DCA is that the evolutionary pressure to maintain functional or structural residue interactions is reflected as direct correlations in multiple sequence alignments (MSAs) of related sequences. Therefore, marginalizing a joint probability distribution of sequences in the alignment is useful to identify important pairwise interactions in biomolecules. DCA improves on standard correlation methods (such as Mutual Information 3–6 ) by disentangling effects imposed on a given residue-residue pair by the rest of the amino acids in the sequence. 7 One metric, among others, to quantify such effects is called Direct Information (DI) . 1 DCA can hint which pairwise correlations between sequence positions are evolutionarily important. As such, it can be used to study various types of interactions, in particular those related to structural conformations. For either proteins or nucleic acids, coevolutionary measures have been typically used for the following applications: 8 • Structure prediction. Specific patterns on a DI plot can indicate secondary and tertiary structural elements. Highly correlated, yet sequentially distant, residues are expected to be close in 3D space and are used to predict protein and RNA tertiary structures. 9–12 • Protein conformational dynamics. High DI between regions distant on a known structure may indicate its ability to undergo large scale functional rearrangements. Superposition of different DI plots and contact maps can be used to uncover a complete functional conformational landscape of proteins . 13 2

ACS Paragon Plus Environment

Page 2 of 15

Page 3 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

• Analysis of folding pathways. DCA can be used to predict important parameters in folding (such as folding rates or φ values). 14 Direct couplings can also be the driving force behind proper folding of a protein, even for proteins with non-trivial topology, 15 such as knotted proteins . 16 • Binding site identification. Important regions such as binding sites which might be preserved during evolution can be identified with DI and predicted to be of functional value. 17 • Interaction interface inference. Residue interactions in molecular interfaces also coevolve, therefore coevolutionary couplings can be used to infer interfaces in both homo and heterodimeric systems. 14,18–20 Molecular interface detection can be the basis for e.g. recognition, binding mode prediction, specificity determination, and druggable interface identification. • Interaction partners prediction. Presence of significant coevolutionary patterns between two different protein families could indicate a direct interaction between them. 21,22 Here we introduce DCA-MOL - a PyMOL plugin which streamlines evolutionary analysis of molecular sequences with DCA. DCA-MOL maps coupling scores onto user-specified structures, calculates contact maps using various criteria, and visualizes selected interactions on the structure. DCA-MOL can be used to analyze various quantifiable interactions on a broad spectrum of structures. By combining PyMOL’s 23 molecular graphics software with the interactive visualization of coevolutionary information, DCA-MOL constitutes a unique application to study direct couplings right on the molecules in question. Traditionally, to visualize and analyze DCA results, users have to perform multiple mappings and alignments, to relate DI pairs to available structures for a specific sequence from the MSA. This leads to the possibility of missing residues in either or both of the sequences and the requirement of another program (or an inefficient manual selection) to compare found couplings with features and physical contacts on the reference structure. 3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

DCA-MOL performs all of these tasks automatically, within one application, to allow users to concentrate on the most important part uncovering biologically relevant interactions preserved through evolution.

Related software While DCA-MOL is, to the best of our knowledge, the only stand-alone visualization software for residue couplings, some of the coevolutionary computational web servers also include some visualization capabilities, e.g. BIS2Analyzer (http://www.lcqb.upmc.fr/BIS2Analyzer), 24 Gremlin (http://gremlin.bakerlab.org/), 25 EVfold 26 (http://evfold.org/) and RaptorX (http://raptorx.uchicago.edu/). 27 The main purpose of DCA-MOL is to provide a local tool that is interactive, takes advantage of the features of PyMOL and allows users to create publication-quality images that can be used in reports and articles.

Methods Implementation Details DCA-MOL is implemented as a PyMOL plugin, which is available for Linux, MacOS and Microsoft Windows operating systems. It allows users to interactively analyze and visualize coevolutionary residue-residue interactions as contact maps and projected onto structures with different types of analysis modes including single monomeric states, interface analysis and multiple state analysis (Fig. 1).The source code of DCA-MOL, further descriptions, as well as usage instructions can be found at http://dca-mol.cent.uw.edu.pl. Inputs: The main functionality of DCA-MOL is the visualization of coevolutionary residue-residue interactions via automatic mapping of DCA results (or other inference procedures and metrics) on user selected structures. To do that the user must provide a tab delimited file containing at least 3 columns (which corresponds to the typical output of DCA programs - any line not conforming to the ”numerical columns (at least three)” standard

4

ACS Paragon Plus Environment

Page 4 of 15

Page 5 of 15

Input DI score Structure Alignment

Analysis type selection

{

Perform analysis in DCA-MOL • Native contacts comparison mode • Recolor by true/false positives

• Single state • Interactively selecting interactions • Interface • Multiple states on contact maps and 3D structures

149 174 209 234 5dn6 chain A

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

259

Output

284 309 336

Figures (Contact map and structures)

List of True positive rate 361 selected bonds for selected propotion 149 with distance of top DI pairs

174

209

234

259 284 309 5dn6 chain A

336

361

contact map

Figure 1: Pipeline of DCA-MOL. DCA-MOL is a PyMOLl plugin designed to interactively visualize coevolutionary residue-residue interactions estimated from DCA in contact maps and 3D structures. DCA-MOL takes three files as input: a Direct Information (DI) score file, a PDB file with 3D coordinates and a I’m planning to come to work twice per week. formatted alignment file. By selecting from the different types of analyses, we can study amino acid or nucleotide interactions inside a single protein/RNA or between protein interfaces. The multiple state mode is used to study protein conformational dynamics and multiple protein complexes. The default interaction map is a plot of DI scores for all residue-residue pairs. DCA-MOL can interactively visualize residue-residue maps and 3D structure. For each DI map the user can specify different thresholds for the top DI pairs, automatically compute true positive pairs when comparing against native contacts as well as selecting user-defined pairs. Results of the analysis can be saved as images or datafiles. are skipped). The first column indicating one residue position in the sequence, the second column - the coupled sequence position, and the last column represents the DI score for such pair. The file does not have to be sorted and if the file contains more than 3 columns (e.g. including other score types), the user will be prompted to indicate which column contains the specific metric to visualize. To ensure proper mapping between scores and a specific structure, a FASTA formatted alignment file must be provided, which will be used for relating indices from the score file with the sequence - this should be the input file used for DCA calculations (although only sequences for which structures will be analyzed need to 5

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

be present). Providing the structure, as a coordinates file in a PyMOL recognized format (which can be already loaded into PyMOL, or added using DCA-MOL either from a file, or directly downloaded from RCSB PDB) 28 is suggested, but not necessary. When no structure is provided, only the interactive plot of the DI scores will be shown. Output: By default, for each molecule separately, DCA-MOL performs an automated mapping, returning an interactive map of coupled residues and 3D structure visualization. These plots can be easily customized by zooming, modifying colormaps, changing value ranges, and exported in multiple formats. For proteins or RNAs, secondary structure features are marked next to the plot. Users can select regions of interest in the map or the structure. Selected residue-residue interactions will be highlighted in both plots. The option to Show list of selected bonds provides more detailed information of these interactions like position indices, specific residues and distances in the coordinates.

Basic analysis: Native contact maps for selected structures are calculated automatically, with various distance and atom types criteria. These maps can be compared with DI values for a given structure either through coloring, side-by-side comparison (Native contacts comparison mode), or by overlaying both plots (lower triangle is the native contact map and upper triangle is DI contact maps in Figure 1). A true positive rate can be automatically calculated in real-time to evaluate the quality of contact predictions. DI pairs could be recolored as true positive or false positive to have an idea of the overlap of DI pairs to the native fold of the molecule. It is also possible to toggle off false positives and show only true positives. Selecting data points on the plot adds colored bonds to the structure visualized in the main PyMOL window. In the ATP synthase protein example, selected interactions are marked with a red rectangle in the DI map and red lines in the 3D structure (Fig. 1). Selection works both ways: By selecting two non-overlapping regions on the visualized structure

6

ACS Paragon Plus Environment

Page 6 of 15

Page 7 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

or by doing a selection on the residue-residue plot. Both contact-indicating bonds added to the structure and plot selections can be kept (and remapped) between different plot types, and structures.

Molecular interface analysis: Usually, DCA for interfaces is performed on concatenated MSAs of interacting chains. In such case a single sequence from the alignment corresponds to two structures. To analyze such data using DCA-MOL, users can load several structures to be mapped consecutively to any given sequence. Protein interfaces between similar domains: If both chains/domains that make up the interface belong to the same family, they may appear in the same columns of the alignment (as two entries, instead of consecutively). In such case, users can mark them as Part of an interface during input selection. The interface between them will then appear on the list of molecules available for analysis, with its own interchain contact map. Interface positioning: DCA can be used to detect interchain residues pairs that form part of the interface. By combining PyMOL structure editing tools with DCA-MOL contact map recalculation, it is possible to rearrange the interfacing molecules, so that their interchain contact map reflects the DCA obtained interactions as closely as possible.

Multimodel mode analysis: Structure coordinate files may contain multiple models, e.g. when analyzing NMR ensembles, trajectories from molecular dynamics simulations, or proteins with multiple conformations (e.g. apo/holo). In such cases, these models can be loaded into DCA-MOL as additional states. For each state, a separate contact map is calculated, as well as a ”minimal” contact map for all states. This minimal map shows the shortest distance recorded between given residues over all available states (i.e. to see whether a given residue pair is in a contact in any state).

7

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Application examples DCA-MOL can be used to study different types of interactions in monomeric proteins, RNA, molecular interfaces, protein conformational dynamics as well as multiple protein complexes. Here, we showcase DCA-MOL with two sample cases: 1) a protein interface analysis in isocitrate dehydrogenase protein dimers (Fig. 2) and 2) a multi-state ligand binding protein (Fig. 3). More examples, such as coevolutionary signals in RNA 10 and ATP synthase protein can be found at the official DCA-MOL website: dca-mol.cent.uw.edu.pl.

Sample case 1 – Interfacial interactions in isocitrate dehydrogenase protein dimers. A

B

Residue in 2iv0 chain B

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Residue in 2iv0 chain A

Figure 2: DCA-MOL analysis of isocitrate dehydrogenase interfacial interactions (PDB ID: 2iv0). A) A DI map is shown in the lower triangle (x-axis for residues in Chain A, y-axis for residues in Chain B), and 3D structure in the upper triangle (red for chain A, yellow for chain B). The interactions between two chains are marked with a red rectangle in the DI map and black dash lines in the structure. B) A collection of DCA-MOLs options and features. DCA output files used here were calculated using the DCA server (dca.rice.edu). Interactions between residues in molecular interfaces could also be coevolved. DCA 8

ACS Paragon Plus Environment

Page 8 of 15

Page 9 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

studies can not only be used to capture the interactions inside the protein monomer, but also be used to infer interfacial interactions in homo or heterodimeric systems. With DCAMOL, users can switch and compare these interactions more efficiently. In this sample case, we use isocitrate dehydrogenase dimers to showcase the molecular interface analysis (Fig. 2). Isocitrate dehydrogenase is a homodimer containing two chains (A and B) [ PDB id 2iv0 and Pfam ID PF00089.25]. By including both chains independently in the alignment file, loading a corresponding chain from the structure for each sequence, and indicating that they form a molecular interface, we are able to study them separately, and also as a dimer. This mode allows switching and comparing between monomeric interactions and interfacial interactions efficiently by choosing different plot options in the drop-down list. In this example, we found that some top DI value pairs are far away from each other in the monomeric structure but are close in the dimeric structure. This observation suggests that these interactions could be important to keep quaternary structure and function.

Sample case 2 – Conformational dynamics in periplasmic binding proteins. While performing their function, some proteins experience large conformational changes. DCA-MOL can be utilized to study how co-evolving pairs play a role in different protein conformations. Here, we use DCA-MOL to illustrate the conformational change of L-leucine binding protein, [PDB IDs 1usg (open state) and 1usi (close state)] upon ligand binding. Within the multimodel mode, we can switch and compare between different contact maps for different states (see change current state option). The native contact map (lower triangle) shows a similar pattern between two states, except for a set of contacts exclusive to the closed state (Fig. 3). In our predicted contact map (upper triangle), we do not only get the interaction information for single states, but also interactions that are essential for function during conformational plasticity. For example, selected with a red rectangle are 9

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

A

B

3

3

53

53

103

103

156

156

206

206

257

257

308

308

Residue index

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3

C

53

103

156

206

Residue index

257

308

Page 10 of 15

3

D

53

103

156

206

Residue index

257

308

Figure 3: DCA-MOL analysis of L-leucine-binding protein: A,B) Apo state (PDB ID 1usg); C,D) closed, holo state (PDB ID 1usi). DCA-MOLs interactive plots of Direct Information (upper triangle) and contact maps of the structures (lower triangle). Selected with a red rectangle are contacts present only in the closed conformation (top). PyMOL visualization of the structure (red). Predicted interactions selected on the plot are shown as purple bonds (bottom). contacts which appear in the contact map only for the closed, ligand bound, conformation (Fig. 3). By integrating structural information taken from different states, DI pairs can be used to identify an ensemble of different conformation of the protein along with their coevolutionary signals. The multiple state model of DCA-MOL will allow users to clearly visualize the interactions during protein dynamics.

Conclusions DCA-MOL provides a highly efficient tool to analyze DCA coevolutionary residuesresidues interactions. It automates many tasks, such as multiple mappings, generation of 10

ACS Paragon Plus Environment

Page 11 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

contact maps and distance maps. DCA-MOL is highly interactive, allowing the user to dynamically change residues of interest and seamlessly switch between different structural states. DCA-MOL can be adapted for many different kinds of analyses such as: structure prediction, binding site identification, the study of protein conformational dynamics, folding pathways analysis and interface interaction inference. Although tailored to analyze DCA couplings, this plug-in can work with other inference algorithms that provide residue-residue metrics of coevolutionary correlations. We expect that scientists and users in the field of coevolutionary analysis will benefit from its multiple features and find it resourceful for many different applications.

Funding This work has been supported by the National Science Centre [#2012/07/E/NZ1/01900 to JIS], the European Molecular Biology Organization [#2057 to JIS] and funds from the University of Texas at Dallas [FM].

Acknowledgement The authors thank Agata Perlinska for invaluable help with feature design and Ammar Adenwalla for testing of the plugin.

References (1) Morcos, F.; Pagnani, A.; Lunt, B.; Bertolino, A.; Marks, D. S.; Sander, C.; Zecchina, R.; Onuchic, J. N.; Hwa, T.; Weigt, M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, E1293–E1301. (2) Weigt, M.; White, R. A.; Szurmant, H.; Hoch, J. A.; Hwa, T. Identification of direct 11

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

residue contacts in protein–protein interaction by message passing. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 67–72. (3) Altschuh, D.; Lesk, A.; Bloomer, A.; Klug, A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 1987, 193, 693–707. (4) G¨obel, U.; Sander, C.; Schneider, R.; Valencia, A. Correlated mutations and residue contacts in proteins. Proteins: Struct., Funct., Bioinf. 1994, 18, 309–317. (5) Livesay, D. R.; Kreth, K. E.; Fodor, A. A. Allostery; Springer, 2012; pp 385–398. (6) Taylor, W. R.; Hatrick, K. Compensating changes in protein multiple sequence alignments. Protein Eng., Des. Sel. 1994, 7, 341–348. (7) Burger, L.; Van Nimwegen, E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol. 2010, 6, e1000633. (8) De Juan, D.; Pazos, F.; Valencia, A. Emerging methods in protein co-evolution. Nat. Rev. Genet. 2013, 14, nrg3414. (9) Sulkowska, J. I.; Morcos, F.; Weigt, M.; Hwa, T.; Onuchic, J. N. Genomics-aided structure prediction. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 10340–10345. (10) De Leonardis, E.; Lutz, B.; Ratz, S.; Cocco, S.; Monasson, R.; Schug, A.; Weigt, M. Direct-coupling analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction. Nucleic Acids Res. 2015, 43, 10444–10455. (11) Michel, M.; Skwark, M. J.; Men´endez Hurtado, D.; Ekeberg, M.; Elofsson, A. Predicting accurate contacts in thousands of Pfam domain families using PconsC3. Bioinformatics 2017, 33, 2859–2866. (12) Taylor, W. R.; Sadowski, M. I. Structural constraints on the covariance matrix derived from multiple aligned protein sequences. PLoS One 2011, 6, e28265. 12

ACS Paragon Plus Environment

Page 12 of 15

Page 13 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(13) Morcos, F.; Jana, B.; Hwa, T.; Onuchic, J. N. Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 20533–20538. (14) Cheng, R. R.; Morcos, F.; Levine, H.; Onuchic, J. N. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc. Natl. Acad. Sci. U. S. A. 2014, 111, E563–E571. (15) Dabrowski-Tumanski, P.; Jarmolinska, A.; Sulkowska, J. Prediction of the optimal set of contacts to fold the smallest knotted protein. J. Phys.: Condens. Matter 2015, 27, 354109. (16) Jamroz, M.; Niemyska, W.; Rawdon, E. J.; Stasiak, A.; Millett, K. C.; Sulkowski, P.; Sulkowska, J. I. KnotProt: a database of proteins with knots and slipknots. Nucleic Acids Res. 2014, 43, D306–D314. (17) Bai, F.; Morcos, F.; Cheng, R. R.; Jiang, H.; Onuchic, J. N. Elucidating the druggable interface of protein- protein interactions using fragment docking and coevolutionary analysis. Proc. Natl. Acad. Sci. U. S. A. 2016, 113, E8051–E8058. (18) Dos Santos, R. N.; Morcos, F.; Jana, B.; Andricopulo, A. D.; Onuchic, J. N. Dimeric interactions and complex formation using direct coevolutionary couplings. Sci. Rep. 2015, 5, 13652. (19) Schug, A.; Weigt, M.; Onuchic, J. N.; Hwa, T.; Szurmant, H. High-resolution protein complexes from integrating genomic information with molecular simulation. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 22124–22129. (20) Zschiedrich, C. P.; Keidel, V.; Szurmant, H. Molecular mechanisms of two-component signal transduction. J. Mol. Biol. 2016, 428, 3752–3775.

13

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(21) Skwark, M. J.; Croucher, N. J.; Puranen, S.; Chewapreecha, C.; Pesonen, M.; Xu, Y. Y.; Turner, P.; Harris, S. R.; Beres, S. B.; Musser, J. M.; Parkhill, J.; Bentley, S. D.; Aurell, E.; Corander, J. Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis. PLoS genetics 2017, 13, e1006508. (22) Bitbol, A.-F.; Dwyer, R. S.; Colwell, L. J.; Wingreen, N. S. Inferring interaction partners from protein sequences. Proc. Natl. Acad. Sci. U. S. A. 2016, 113, 12180–12185. (23) DeLano, W. L. The PyMOL users manual. DeLano Scientific, San Carlos, CA 2002, 452 . (24) Oteri, F.; Nadalin, F.; Champeimont, R.; Carbone, A. BIS2Analyzer: a server for coevolution analysis of conserved protein families. Nucleic Acids Res. 2017, 45, W307– W314. (25) Kamisetty, H.; Ovchinnikov, S.; Baker, D. Assessing the utility of coevolution-based residue–residue contact predictions in a sequence-and structure-rich era. Proc. Natl. Acad. Sci. U. S. A. 2013, 201314045. (26) Marks, D. S.; Colwell, L. J.; Sheridan, R.; Hopf, T. A.; Pagnani, A.; Zecchina, R.; Sander, C. Protein 3D structure computed from evolutionary sequence variation. PLoS One 2011, 6, e28766. (27) K¨allberg, M.; Wang, H.; Wang, S.; Peng, J.; Wang, Z.; Lu, H.; Xu, J. Template-based protein structure modeling using the RaptorX web server. Nat. Protoc. 2012, 7, 1511. (28) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The protein data bank. Nucleic Acids Res. 2000, 28, 235–242.

14

ACS Paragon Plus Environment

Page 14 of 15

Page 15 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Graphical TOC Entry

15

ACS Paragon Plus Environment