Subscriber access provided by University of Florida | Smathers Libraries
Article
Partner-Specific Prediction of Protein-Dimer Stability from Unbound Structure of Monomer Hamid Hadi-Alijanvand, and Maryam Rouhani J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00606 • Publication Date (Web): 14 Feb 2018 Downloaded from http://pubs.acs.org on February 15, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Partner-Specific Prediction of Protein-Dimer Stability from Unbound Structure of Monomer Hamid Hadi-Alijanvand, †,* Maryam Rouhani † † Department of Biological sciences, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran.
* Corresponding Author: Hamid Hadi-Alijanvand, Email:
[email protected] TEL: +982433153316 FAX: +982433153322 Department of Biological sciences, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran.
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Abstract Protein complexes play deterministic roles in the live entities in sensing, compiling, controlling, and responding to the external and internal stimuli. Thermodynamic stability is an important property of protein complexes; having knowledge about complex stability helps us to understand the basics of protein-assembly-related diseases and the mechanism of protein assembly clearly. Enormous protein–protein interactions, detected by high-throughput methods, necessitate finding fast methods for predicting the stability of protein assemblies in a quantitative and qualitative manner. The existing methods of predicting complex stability need knowledge about threedimensional (3D) structure of the intended protein complex. Here, we introduce a new method for predicting dissociation free energy of subunits by analyzing the structural and topological properties of protein binding patch on single subunit of the desired protein complex. The method needs the 3D structure of just one subunit and the information about the position of the intended binding site on the surface of that subunit to predict dimer stability in a classwise manner. The patterns of structural and topological properties of protein binding patch are decoded by recurrence quantification analysis. Nonparametric discrimination is then utilized to predict the stability class of the intended dimer with accuracy greater than 85%.
ACS Paragon Plus Environment
Page 2 of 46
Page 3 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
1. Introduction Cells utilize protein–protein interaction (PPI) networks to sense, analyze, and respond to signals. The study of PPI networks and relations between cell’s PPI networks (interactome) is one of the most active research areas after genomics revolution. Studies on protein–protein complexes guide us, for example, to describe the complexity of metabolic regulation, gene expression regulation, signal transduction, and other holistic features of cell.1 X-ray crystallography and NMR spectroscopy are confirmed methods for defining the threedimensional (3D) structure of protein complexes on the atomic level. Scientists map the interface of interacting protein subunits from the 3D structure of complex in two ways: by considering buried residues of surface after complex formation or by setting cutoffs for the inter-subunit distance.2-5 If the 3D structures of individual monomers are available, binding interface of the desired protein complex can be defined by protein-docking or knowledge-based methods in an acceptable range of precession. Then we have prerequisites, information about binding interface, for studying the effect of mutations, environmental variables like concentration of ions, and small molecules and drugs on the behavior of protein complex on the molecular level. An important property of a protein complex is the thermodynamic stability of the complex structure in solution. To compensate for deficiencies of time-consuming and expensive experimental methods in defining complex stability, many computational methods have been developed.6-12 The stability of a complex is proposed to be a function of affinity between subunits and the entropy gains of subunits during dissociation. Steered molecular dynamics simulations, biasingforce methods, and thermodynamic integration methods are examples of robust molecular dynamics approaches to compute the stability of protein complexes. These methods need heavy computations for estimating the free energy of dimer dissociation. The docking method can predict the interface between monomers on the basis of their 3D structures.13 The scoring functions are defined to predict the best dimer structures in docking and to compute the binding affinity of engaged monomers. It is necessary to evaluate the docking results to define the probability of presence of the dimer in solution. Computational requirements of the docking method encourage researchers to invent simpler efficient methods for prediction of protein–protein affinity. Some precise data-driven computational methods use the 3D structure of protein complex to predict protein–protein binding affinity.6,
ACS Paragon Plus Environment
9-11, 14
Data-driven methods
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
utilize the structural information of a small population of protein dimers whose experimental binding affinities are defined; using fitting methods, they make scores for fast and accurate prediction of protein–protein binding affinity.10 The consistency of experimental conditions both in affinity measurements and in crystallographic solutions of used dimers is the challenge of these methods. Some methods use empirical energy functions to make predictions of protein-complex stability. FoldX is designed to compute changes of protein stability due to mutations.15 Furthermore, it has an algorithm that computes free energy of protein–protein binding using the FoldX force field and considering the stabilities of dimer and monomer structures. As a popular and accepted method for prediction of biological assembly of PDB-submitted crystal structures, “Protein Interfaces, Surfaces, and Assemblies (PISA)” uses an empirical free energy function to predict the thermodynamic stability of protein complexes.16 PISA computes the dissociation free energy of proposed protein complexes by calculating the free energy of complex formation and the entropy of complex dissociation. It detects the most probable biological assembly by a recursive approach. PISA enumerates all possible assemblies of molecules in a complex with the aid of crystal information using a graph-matching algorithm.16, 17 Standard free energy of dissociation (∆Gdiss) is computed for each possible complex. Backtracking the proposed states and considering chemical stabilities of computed states, PISA resolves the possible biological units. PISA defines the quaternary structure of protein with 5–10% error; the error fades for complexes with high dissociation free energy.18 The PISA-computed thermodynamic stability correlates qualitatively with molecular-dynamics-computed binding energy of protein associations. 19 In mentioned methods, 3D structure of the complex is essential to predict protein–protein binding affinity or complex stability. Suppose we are interested to study the stability of biological assembly of proteins A and B when the 3D structure of only one subunit is determined. Template-based or machine-learning methods help us predict the binding patch on the surface of the monomer that has a determined 3D structure.20-24 A large number of binary protein–protein interactions is defined by high-throughput methods, whereas a small number of protein dimers’ 3D structures are deposited in PDB. Therefore, our mentioned assumption is not imaginary. In this condition, we introduce a new method that uses the information embedded in binding patch of one monomer to predict the dissociation free energy of protein dimers in a
ACS Paragon Plus Environment
Page 4 of 46
Page 5 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
classwise manner. To make the prediction, we conduct a nonlinear analysis of binding-patch properties. We propose a stability code that is calculated for binding-patch residues of a complex. These calculated values have a specific recurrence pattern on each binding patch. To decode the possible pattern of factors that determine complex stability, we utilize recurrence quantification analysis (RQA) of the proposed code on binding patches of crystallized complexes. The RQA outputs are recruited to predict the class of dimer’s dissociation free energy in a monomer-based manner. Some studies extracted hidden structural features of protein using RQA.25-29 To our knowledge, this is the first time that recurrence analysis of structure-derived parameters of protein binding patch (PBP) is used to predict protein–protein-complex stability. We name the method introduced in current study to predict protein-dimers stability from the properties of PBP on a single subunit PARDIS (protein’s PAtch Resolves DImer Stability).
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 46
2. Methods 2.1. The utilized database To create a library of protein binding patches, we use the freely accessible PISA server and database. All protein heterodimers with no S-S bridge in binding interface are selected. The resulted proteins are refined by PDB filter for heterodimer stoichiometry in biological assembly, exact two chains in crystal, no modified residues, and no ligand in crystal structures. The redundant sequences are prevented at 70% similarity level. Hereafter, we name this subset of PISA as PISA HeteroDimer database (PHD). The PDB IDs of retrieved structures are available in Supporting Information (Table S1). In our study, PBP includes central and surrounding parts, so it considers the possible effects of non-interacting regions of protein surface on complex stability.30 For each chain of an intended protein dimer, central part of binding patch is composed of all residues that are maximally 4 Å away from the opposite chain (inter-subunit residue-distance is ≤ 4 Å). The surrounding part of binding patch is defined as all residues whose Cβ atom is maximally 5 Å away from the central part of binding patch on the same chain (intra-PBP residue-distance is ≤ 5 Å). 2.2. Model inputs In current study, a simple complex-stability code (CSC) is defined for every residue of PBP. For each residue in binding patch, we calculate a coarse-grained average affinity for binding (AAB) using protein docking potential31 (Table S2) and then we define the CSC:
, =
(
) ,
(1)
× ,
In equation 1 the residue, whose CSC is computed and its neighbor are marked by “a” and “b”, respectively. The distance between residues “a” and “b” in binding patch is denoted by “d”. The relative protrusion of each residue is defined as the ratio of its distance from protein’s center of mass to the value of protein’s radius of gyration. The average of mentioned relative protrusions of residues “a” and “b” is computed and used in equation 1 as “R” (Figure S1).
ACS Paragon Plus Environment
Page 7 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
In addition to CSCs of residues on a PBP; total, polar, and hydrophobic accessible surface area (ASA) of monomer; total, polar, and hydrophobic ASA of central part of PBP; and total number of atoms in central region of binding patch are computed. PISA defines protein-complex thermodynamic stability using an empirical energy function that needs information about 3D structure of the complex. PISA-proposed standard-state complexdissociation Gibbs free energy is formulated as:16, 18 ∆ = − ∆ − ∆ = −(∆ + !
+ ! + ! ) − ((" − 1) + (2)
∆$ + ∆$ + 2&∆')
In equation 2, solvation-, rotation-, and transition-related terms are abbreviated to “solv”, “rot”, and “trans”, respectively. “hb”, “sb”, and “db” subscripts represent number (N) of hydrogen bonds, salt bridges, and disulfide bonds. Nonbonded interaction energy is presented as “E”. Parts of accessible surface areas of monomers that are masked in the interface during dimer formation are presented as ∆σ. “F” and “C” are derived from data fitting. Binding free energy, ∆Gint, summarizes solvation energy and energies of interactions between subunits. The rigid-body version of entropy changes during complex formation is presented as ∆S. For ease of writing, we omit the notation of standard state where we note PISA dissociation free energy. Highly positive ∆Gdiss indicates high stability of the intended dimer. We assume that there are hidden patterns of recurrence of CSCs in PBPs. The recurrence plot (RP) of CSCs for each binding patch has a complex scheme. Therefore, to quantify the patterns in the resulted RP we use general metrics computed by recurrence quantification analysis (Figure S1). 2.3. Recurrence Quantification Analysis The concept of recurrence was introduced at 1890. Eckman utilized the recurrence plot to study the recurrence aspects of nonstationary systems.32 Using RP we are able to learn about the recurrence of the dynamic systems’ properties. The RP is generated from recurrence matrix
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(RM) that is derived from a distance matrix of rough data.33 The elements of recurrence matrix are defined as: ),* ,( = +(, − -. − .( -)
(3)
In our study, the elements of distance matrix are defined by Euclidean distance, denoted by double brackets in equation 3, between all CSC values of residue i and those of residue j (Figure 1). The PBP-specific embedded dimension (m) is used to create the corresponding distance matrices. If the distance between i and j is smaller than the threshold, ε, then the Heaviside (H) function returns 1 else it returns 0. Each nonzero value of RM is presented in RP as a dot. In current study, the sequence of protein is presented as a time series of AAB values, which represent the average propensity of each amino acid for protein interaction. During protein folding process, the neighbors of amino acids change sequentially till the 3D structure becomes stable. Now the meaning of the AAB, embedded in CSC definition, would translate from onedimensional context into 3D milieu. We compute the CSC for all amino acids that create a PBP and create a distance matrix. The time series of CSC values of each PBP is converted to a recurrence matrix and then to a RP. The hidden patterns of CSCs in RP are revealed by RQA, which quantifies the pattern of vertical and diagonal lines on RP (Figure 2). Finally, for each partner of a protein complex, RQA-computed parameters (Table S3) and surface-related parameters define properties of a PBP that are used to perform flexible discriminant analysis (FDA). The borders of classes in FDA are defined by PISA ∆Gdiss’s of PBP, which vary from 0 to 55 kcal/mol. 2.4. Classification FDA is one of options for performing nonlinear multiclass classification. FDA is a variant of the common linear discriminant analyses; unlike its ancestor, FDA is able to perform nonparametric regression to handle multiclass predictions.34 FDA may use some methods to perform the regression step. If we have i observations (from 1 to N) in l classes (from 1 to L) with measured features x, then it is possible to predict classes by scoring function θ using linear regression α of measured features.35 Minimizing the average squared residual (ASR), we obtain the separator region between classes:
ACS Paragon Plus Environment
Page 8 of 46
Page 9 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
:
/(01 , 2 3) = 1/ ∑>