Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE
Article
Model Building of Antibody–Antigen Complex Structures Using GBSA scores Noriko Shimba, Narutoshi Kamiya, and Haruki Nakamura J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.6b00066 • Publication Date (Web): 12 Sep 2016 Downloaded from http://pubs.acs.org on September 13, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Model Building of Antibody–Antigen Complex Structures Using GBSA scores Noriko Shimba,*† Narutoshi Kamiya,‡, ǁ, § and Haruki Nakamuraǁ †
Device Research Laboratory, Advanced Research Division, Panasonic Corporation, 3-4
Hikaridai, Seika-cho, Sorakugun, Kyoto 619-0237, Japan ‡
Advanced Institute for Computational Science, RIKEN, QBiC Building B, 6-2-4 Furuedai,
Suita, Osaka 565-0874, Japan ǁ
Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871,
Japan
ACS Paragon Plus Environment
1
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 26
ABSTRACT: Structure prediction of antibody–antigen complexes, which involves molecular docking to generate decoys that are ranked using a scoring function, is an important approach in the design of antibody drugs and biosensors. However, it is not easy to evaluate the stability of protein–protein complexes using a single scoring function. Here, we developed a prediction method of antibody-antigen complex structures using the docking engine “surFit” and a scoring function (GBSA score) that combined the generalized Born (GB) energy and the hydration energy based on the solvent-accessible surface area (SA). We chose 95 antibody–antigen structural data sets for self-docking and generated many decoy structures using the surFit program. The GBSA scores were computed for all of the decoys, and the area under the curve (AUC) of the GBSA scores yielded a higher value (0.972) than the values obtained by the original surFit scores (0.873) and the ZRANK scores (0.953). To improve the accuracy of prediction, molecular dynamics (MD) simulations were performed for several decoy structures that had good GBSA scores. Consequently, average GBSA scores from MD trajectories can reduce the number of non-native decoys that have GBSA scores competitive with the near-native ones.
ACS Paragon Plus Environment
2
Page 3 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
INTRODUCTION
For the purpose of designing antibodies used in medical devices and as drugs, the antibody needs to have high specificity and affinity to capture a biomarker or antigen. Currently, highaffinity antibodies are designed by the mutation of amino acid residues at the antibody– antigen interface based on structural data. For example, for the vascular endothelial growth factor (VEGF) and its antibody, the binding results of alanine scanning of VEGF and the crystal structure of VEGF in complex with its antibody have been used to design target libraries, followed by random mutagenesis of the complementarity-determining regions (CDRs). Consequently, a high-affinity antibody was chosen from the library by affinity selection.1–3 Furthermore, guided by the crystal structure of the complex between VEGF and its antibody, an antibody with additional charged residues at the interface was designed by producing point mutations of surface residues.1–3 Since the development of new antibodies by experimental methods, for example, mutagenesis, usually requires significant time and effort, the development of a rapid computational method for accurately predicting the structure of antibody–antigen complexes would be very useful for the development of new high-affinity antibodies. Neither the prediction of antibody–antigen complex structures using computers nor the design of highaffinity antibodies based on the predicted structure is straightforward because of limited accuracy of such predictions.1–3 A variable region fragment (FV) of an antibody consists of a light chain (VL) and a heavy chain (VH). The binding site in the FV has six loops (i.e., CDR-L1, -L2, -L3 in VL and CDRH1, -H2, -H3 in VH). CDR-H3 plays a particularly important role in binding an antigen. The binding site of an antibody is almost limited to the CDR. Therefore, the prediction of an antibody–antigen complex structure requires prediction of the position of the antigen bound to the CDR of the antibody.
ACS Paragon Plus Environment
3
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 26
There are several web services and programs for protein–protein docking, for example, surFit4, 5 (URL, http://sysimm.ifrec.osaka-u.ac.jp/surFit/), RosettaDock,6 and ZDOCK.7 They generate decoys starting from crystal structures or modeled structures, followed by ranking of the decoys with a scoring function. The surFit docking engine generates decoys with all possible pairwise matching between two proteins. The decoys are then ranked by a coarse score consisting of the shape complementarity and a van der Waals penalty. Some accurate scoring functions were used to re-score the surFit result.8, 9 In antibody–antigen complexes, electrostatic interactions are, in general, more dominant than hydrophobic interactions when compared with other protein–protein complexes.10 Since the interfaces between antibodies and antigens interact with the solvent when they are not in complex, the affinity of the complex is estimated by the difference between the free energy of the complex and the sum of the free energies of the two proteins in the apo-state. A simple approximation method for evaluating electrostatic and solvation free energies is to use the continuum solvation model, computing the former electrostatic energy by the generalized Born (GB)11 method and the latter hydration free energy from the solvent-accessible surface area (SA).12 This method, termed the GB/SA method, has been frequently applied to evaluate binding affinities between proteins and ligands,13–15 and to predict protein–protein docking after filtering with a knowledge base score.16 In this study, we developed a new prediction method that combines molecular dynamics (MD) simulations with a scoring function using the GB/SA method. We generated 95 antibody–antigen docking data sets from the Protein Data Bank (PDB)17 and extracted each antibody and antigen monomer. For self-docking, many complex structures were generated as decoys by using the surFit program. Those decoys were ranked by our scoring function, “GBSA score”, using the GB/SA method. Our GBSA score was shown to have better performance than ZRANK scores. MD simulations were then performed to refine decoys
ACS Paragon Plus Environment
4
Page 5 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
chosen from the above self-docking procedures. We found that the prediction accuracy was improved by introducing the average GBSA scores along the MD trajectories over the GBSA scores.
ACS Paragon Plus Environment
5
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 26
DATA SETS Data sets for self-docking were chosen from the PDB according to the following conditions. (1) Antibody FV fragments that have a heavy chain (VH) and a light chain (VL) were used as antibody structures of data sets. Single-chain FV fragments and antibody FV fragments with missing atoms were excluded. The decision on whether amino acid sequences of antibodies correspond to the antibody VH or VL fragments was made using the consensus sequence by the Kabat numbering scheme.18 (2) Antigens bound to the CDR of the antibody were used for the data sets. Here, antigens with missing atoms except the termini and whose sequence lengths are < 10 residues were excluded. (3) Structures that have low resolution (> 2.8 Å) were also excluded. Finally, 95 antibody–antigen data sets for self-docking were successfully generated (Table S1), where the near-native decoys of all of the data sets are within the top 100. The definition of a near-native decoy is shown in the METHODS section. A summary of the antibody–antigen data sets for self-docking is as follows: The length of the antigens ranged from 10 to 60 residues, and the length of the CDR-H3 ranged from 3 to 22 residues. The percentage ratio of charged residues in the antigens ranged from 0% to 50%, and the ratio of charged residues present in the CDR of the antibodies ranged from 6% to 25%. Asp, Glu, Arg, and Lys were defined as charged residues.
ACS Paragon Plus Environment
6
Page 7 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
METHODS Docking Between an Antibody and an Antigen We executed self-docking using the docking engine surFit to generate decoys, followed by scoring with the original surFit score, where shape complementarity and a van der Waals penalty were calculated. The top 100 decoys in each data set were stored. A near-native decoy of the data set for the self-docking was defined as the highest-ranked structure that satisfies equation (1). (fnat ≥ 0.5) OR [fnat ≥ 0.3 AND (L_rms ≤ 5.0 OR I_rms ≤ 2.0)]
(1)
Here, the ratio of correct residue–residue contacts (fnat), root-mean-square deviation (RMSD) of the main chain of the antigen (L_rms), and RMSD of the main chain of the binding site (I_rms) were used following the quality criteria of CAPRI (Critical Assessment of PRediction of Interactions).19
GBSA Score The GBSA score for the antibody–antigen interaction was computed as follows. The total energy using the GBSA method (Etotal_GBSA) is the sum of the intra-molecular covalentbondrelated energy (Ecov), the electrostatic energy (Eele), the van der Waals energy (EvDW), the generalized Born energy (EGB), and the hydration energy estimated from the solventaccessible surface area (EASA) (Eq. 2). Etotal_GBSA = Ecov + Eele + EvDW + EGB + EASA
(2)
In this study, we defined the GBSA score as the difference between the total energy of the complex structure (Etotal_GBSA_c) and the sum of the total energies of the antibody structure (Etotal_GBSA_ab) and the antigen structure (Etotal_GBSA_ag) (Eq. 3). GBSA score = Etotal_GBSA_c - (Etotal_GBSA_ab + Etotal_GBSA_ag)
(3)
ACS Paragon Plus Environment
7
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 26
Here, we used Etotal_GBSA_c in the energy-minimized structure of each decoy to remove steric clashes between atoms, and Etotal_GBSA_ab and Etotal_GBSA_ag were calculated using the same structures as the minimized structure of the complex. The Ecov energy term should be canceled by subtraction because the complex structure is exactly the same as each apo-structure. Therefore, the GBSA score consists of the following three terms in a physicochemical sense: (1) the electrostatic interaction energy between two molecules including the dielectric shielding effect due to the GB term; (2) the van der Waals force between two molecules; and (3) the difference of the hydration energies at the interfaces. The electrostatic and van der Waals interaction energies correspond to the direct contribution between two molecules, and the EGB and EASA terms are associated with the solvent contributions. Thus, the GBSA score reflects the solvent free energy, in addition to direct interaction energies. In the calculation of GBSA score, the cosgene program20 in myPresto (URL, http:/presto.protein.osaka-u.ac.jp/myPresto4) was used after adding hydrogen atoms by the tplgene program in myPresto. In the current GB/SA method, a probe water radius of 1.4 Å was used for computation of the accessible surface area.21 All of the parameters for GB/SA computation in the cosgene program were taken from the TINKER v3.9 program.22 The calculation time of the GBSA score was dependent on the size of the molecule, with times ranging between 40 s and 10 min to calculate a GBSA score with energy minimization.
Scoring Each decoy in the data sets for self-docking was examined by the surFit score, the GBSA score, and the ZRANK score. Here, the ZRANK score was calculated by the ZRANK program10 after adding hydrogen atoms to the decoy.
ACS Paragon Plus Environment
8
Page 9 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
We compared these scoring functions using receiver operating characteristic (ROC) curves, which show the relationship of the true positive fraction (TPF) against the false positive fraction (FPF) of the near-native decoys of the data sets. Higher area under the curve (AUC) values of ROC indicate better accuracy of prediction of the complex structures.
Molecular Dynamics Simulations The program psygene-G23 was used for the MD simulations. This program executes a rapid MD calculation using GPGPU (general-purpose computing on graphics processing units) for calculating electrostatic interactions and by using the space decomposition algorithm. The zero-dipole summation method,24 which is a cut-off-based method for calculating the electrostatic interaction and has high accuracy comparable to that of the particle mesh Ewald method,25 is implemented in the psygene-G program and has been applied to various systems.26–29 Water molecules were placed explicitly in a rectangular box around the antibody and the antigen. Here, the protocols of the MD simulations are briefly described. First, 0.1 ns NPT (constant number of particles, pressure, and temperature) simulation at 300 K, 1 bar, and with 0.5 fs MD-time steps was performed to equilibrate the system. Then, a 10 ns canonical (constant number of particles, volume, and temperature) simulation at 300 K with 1 fs MDtime steps was carried out as a productive run. In this productive run, covalent bonds between heavy atoms and hydrogen atoms were constrained using the SHAKE method.30 We analyzed the MD trajectory using I_RMSD and GBSA score. The MD trajectory from 4 to 10 ns after the RMSD of the antigen atoms had reached equilibrium was used for scoring. Here, I_RMSD is the RMSD of the antigen atoms at the antibody–antigen interface superimposed onto the antibody structures during the MD simulation with reference to the initial structure of each MD simulation.
ACS Paragon Plus Environment
9
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 26
RESULTS and DISCUSSION Self-Docking Self-docking was performed using the docking engine surFit to generate 1,000 putative decoy clusters for each of the 95 antibody–antigen complexes. For the top 100 decoy structures, the GBSA scores and the ZRANK scores were computed, and the ranks of the near-native decoys in the 95 data sets by the two scoring functions in addition to the original surFit scores are shown in Table S1. In 87 data sets, the near-native decoys were ranked within the top 10 by the GBSA scores, in 64 data sets by the surFit scores, and in 81 data sets by the ZRANK scores, suggesting that the GBSA score could be a good measure to predict the antibody– antigen complex. In addition, as shown in Figure 1, the near-native decoys were predominantly ranked first with the GBSA scores, particularly for the near-native decoys with the GBSA scores less than −60 kcal/mol. In Figure 2, the RMSDs of the three examples referring to the crystal complex structures are presented with the GBSA scores, where the GBSA scores of the near-native decoys gave the highest ranks. In the cases of 1A3R (Figure 2a) and 2P8L (Figure 2b), the near-native decoys had the lowest GBSA scores of −142.6 and −96.1 kcal/mol, respectively. In each case, the difference between the near-native decoy and the second-ranked non-native decoy was > 20 kcal/mol. In the case of 3G5Y, the near-native decoy had the lowest GBSA score (−50.1 kcal/mol), although the difference between the near-native decoy and the second-ranked nonnative decoy was small (Figure 2c). As shown in these examples, for 48 of 64 data sets whose near-native decoys were ranked the highest, the GBSA scores of the second-ranked nonnative decoys were more than 10% higher than the GBSA scores of the near-native decoys. Thus, the near-native decoys in many data sets could be identified successfully by the current GBSA scores.
ACS Paragon Plus Environment
10
Page 11 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 1. Ranks of the near-native decoys for 95 antibody–antigen data sets in self-docking by the GBSA scores.
Figure 2. The GBSA scores plotted against the RMSD values of the antigen structures taken from the X-ray structures of the complexes, where all of the antibody heavy atoms are superimposed. a) 1A3R, b) 2P8L, and c) 3G5Y.
ACS Paragon Plus Environment
11
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 26
The ROC curves of the near-native decoys for 95 data sets ranked by the original surFit scores, the GBSA scores, and the ZRANK scores are shown in Figure 3. The rise of the ROC curve by the GBSA scores was the steepest in all three cases, and the TPF reached 1 when the FPF was 0.2. In the case of the ZRANK scores, the rise of ROC was as rapid as that for the GBSA scores. However, the TPF of ZRANK was saturated at FPF ~0.05, and when TPF reached 1, the FPF value was ~0.5; this was much larger than the case in the GBSA scores. In the case of the surFit scores, the rise of the curve was slow and the TPF also slowly reached 1. The AUCs for the surFit, GBSA, and ZRANK scores were 0.873, 0.972, and 0.953, respectively, indicating that the prediction accuracy using the current GBSA scores was the highest. A significant difference in the ROC curves between the cases for the GBSA scores and the ZRANK scores was observed when the TPF was between 0.8 and 1. The data sets whose near-native decoy was ranked highest by the GBSA scores and ranked worse by the ZRANK scores were 1FJ1, 1IC4, 1J1O, and 1NDG (Table S1).
ACS Paragon Plus Environment
12
Page 13 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 3. ROC curves of the self-docking for 95 antibody–antigen data sets, where the nearnative decoys were ranked by the surFit scores (dashed line), the GBSA scores (black solid line), and the ZRANK scores (gray solid line).
To understand the reason why the GBSA scores gave the highest accuracy, we investigated the contributions from generalized Born energies and hydration energies, by computing the GB term, ∆EGB, and the hydration term, ∆EASA. Here, the differences were counted between the corresponding energies in the decoy complex structures (EGB_c and EASA_c) and the sums of those in the separated antibody structures (EGB_ab and EASA_ab) and the antigen structures (EGB_ag and EASA_ag) as in equations (4) and (5).
∆EGB = EGB_c - (EGB_ab + EGB_ag)
(4)
∆EASA = EASA_c - (EASA_ab + EASA_ag)
(5)
First, the GB term of the near-native decoy for each data set was compared with the hydration term (Figure 4). The contributions from the GB terms, ∆EGB, were about 45-fold larger than those from the hydration terms, ∆EASA, and so the GB terms essentially governed the GBSA scores. The large contributions from the GB terms should reflect that polar and ionic residues, which are located on the surfaces of the apo-structures, but varied in the complex structures, form attractive interactions in many complex structures. Namely, the GB/SA method estimates the dielectric shielding effects well for the surface charges of the apo-proteins, which are much larger than those for ion pairs and hydrogen bonds at the antibody–antigen varied interfaces.31 Thus, those GB terms, ∆EGB, had large positive values, and so the GBSA scores could give accurate predictions for the antibody–antigen complexes with many polar and ionic attractive interactions at the interfaces.
ACS Paragon Plus Environment
13
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 26
Figure 4. The GB terms (∆EGB, solid squares) and the hydration terms (∆EASA, open diamonds) of all near-native decoys plotted against the GBSA scores.
However, in several cases, the current GBSA scores were not able to identify the correct complex structures. As shown in Figure 1, when the absolute values of GBSA scores were less than 60 kcal/mol, several near-native decoys were not ranked at the first position. In the cases of 1EO8, 2WUC, and 3EFD, the prediction results by the GBSA scores were worse than those from either the original surFit or the ZRANK scores (Table S1). There are two possible reasons for this. The first is that the area of the binding site was small. In fact, in the cases of 1EO8 and 3EFD, the areas of the near-native decoy were 674 and 573 Å2, respectively, which are noticeably smaller than those of the first-place decoy (1032 and 908 Å2). The second reason is that the GB term for the near-native decoy was small. In fact, in the case of 2WUC, ∆EGB was 217.2 kcal/mol and the area of the binding site was 1031 Å2, which indicates that the interfaces interact loosely. In addition, we ranked the near-native decoys of 19 antibody–antigen data sets in ZDOCK benchmark 3.0 using the ZDOCK scores and the GBSA scores (Table S2). The results using the GBSA scores (AUC=0.949) were good, as were those using ZDOCK (0.918).
ACS Paragon Plus Environment
14
Page 15 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Improvement of Prediction by MD Simulations Because the surFit docking engine does not always provide the correct complex structures, an MD simulation could improve the detailed interactions between the antibody and antigen molecules. Thus, we performed short MD simulations to remove the strains of the complex structures. Here, three representative data sets (PDB IDs: 1MVU, 1YJD, and 2QQN) for MD simulations were chosen, features of which are as follows: (1) An antibody with a helical peptide as the antigen (PDB ID: 1MVU): the GBSA score of the near-native decoy, −42.4 kcal/mol, was the worst (the absolute value was the smallest) in the current 95 data sets, as shown in Figure 1 and Table S1. (2) An antibody with the antigen, human CD28 (PDB ID: 1YJD): the absolute value of the GBSA score of the near-native decoy (−51.9 kcal/mol) was also small, and the rank of the near-native decoy by the GBSA score was 11 in 100 decoys. (3) An antibody with the antigen, neuropilin (PDB ID: 2QQN): the surface area of the binding site was the smallest, 490 Å2, in the 95 data sets. Each antigen consisted of 13 (1MVU), 118 (1YJD), and 154 (2QQN) residues. The top 10 decoys of each set ranked by the GBSA scores were chosen as the initial candidates for the successive MD simulations. The near-native decoy of the set of 1YJD was not found within the top 10 decoys and thus the decoy ranked 11 (Table S1) was also included as an initial candidate for the MD simulation. Decoys whose antigen did not bind to the CDR or its vicinity18 were excluded. Finally, six decoys in 1MVU, seven decoys in 1YJD, and ten decoys in 2QQN were chosen as the initial structures of MD simulations, in addition to the native crystal structures as their control data. The number of atoms and box sizes of the native structures in canonical MD runs were 39,761 and 66.8 × 63.8 × 93.8 Å3 (1MVU); 86,050 and 91.3 × 90.5 × 104.6 Å3 (1YJD); and
ACS Paragon Plus Environment
15
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 26
88,970 and 91.6 × 116.5 × 83.3 Å3 (2QQN), respectively. For the largest 2QQN system, the MD simulations for 10 ns took 3.4 d using eight parallelized CPUs plus eight graphics processing units (GPUs) (NVIDIA Tesla K20). Each MD trajectory was analyzed by computing the I_RMSD values, and the trajectories are shown in Figure 5. I_RMSDs along the trajectories starting from the native crystal structures in all three systems as their control (red) were < 3 Å during the 10-ns simulation period. I_RMSDs of one near-native decoy trajectory (light green) in 1YJD was also < 3 Å during the 10-ns simulation (Figure 5b). I_RMSDs of the other near-native decoys reached ~4 Å after 3 ns, and they attained values of 3–3.5 Å during the latter half of the simulation period (Figures 5a and c). From the rather short MD trajectories, it appears that I_RMSDs reached equilibrium after 4 ns in most cases. Accordingly, the decoys that had large I_RMSDs after 4 ns could be judged as true negative decoys.
ACS Paragon Plus Environment
16
Page 17 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 5. I_RMSDs of antigens during MD trajectories starting from the selected decoys and the native structures for three data sets: a) 1MVU, b) 1YJD, and c) 2QQN. native: native structure, near-: near-native decoy, and the three-digit numbers give the ranking of the decoy by the GBSA scores.
It is expected that the antibody–antigen complex predicted as the non-native decoy would be so unstable that the antigen would dissociate from the antibody during the MD simulation. In fact, the antigen of some non-native decoys dissociated from the antibody during even the short 10-ns period, as shown in Figures 5a–c, and the MD trajectories starting from the native crystal structures and the near-native decoys were stable by monitoring I_RMSD. Although many antigens of the non-native decoys did not dissociate from the antibodies, I_RMSDs of native structures and near-native decoys were small enough, and I_RMSDs of non-native decoys were equal to or larger than those of the near-native decoys. Therefore, I_RMSD should be a good measure to reject some non-native decoys. We then examined whether the MD simulations could improve the accuracy of the complex prediction. In Figure 6, the averaged GBSA scores from 4 to 10 ns after equilibrium are shown with the average RMSDs of the antigens from 4 to 10 ns of the MD trajectories with reference to the X-ray structures. Interestingly, in many cases, the average GBSA scores of the non-native decoys deteriorated after the MD simulations, compared with those of the initial structures. In contrast, the average GBSA scores of both the native structures and the near-native decoys maintained values similar to those of the initial structures. Consequently, the average GBSA score during the MD trajectory could, in general, provide a better measure to select the nearnative structures among many decoys.
ACS Paragon Plus Environment
17
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 26
However, one non-native decoy in 1MVU, as shown in Figure 6a, was extremely stable, and it was still identified as the most putative decoy, although the complex structure was very different from the native one (RMSD ~11 Å). For 1YJD, two non-native decoys in Figure 6b (RMSD ~20 Å, ~23 Å) were also as stable as the near-native decoy.
Figure 6. The GBSA scores plotted against RMSDs of structures of antigens with reference to the X-ray structures, where the antibody atoms are superimposed. a) 1MVU, b) 1YJD, and c) 2QQN. Open circles: the GBSA scores of the initial structures against the RMSDs of the antigens in the initial MD structures. Filled circles: the average GBSA scores from 4 to 10 ns of the MD trajectories against the average RMSDs from 4 to 10 ns of the MD trajectories with reference to the X-ray structures. Crosses: the average GBSA scores from 4 to 10 ns, where the I_RMSD during MD trajectories reached values > 10 Å.
ACS Paragon Plus Environment
18
Page 19 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
The native structure, the near-native decoy, and the best non-native decoy ranked by the average GBSA scores in the three systems are shown in Figure 7. The position of the backbone chain of each near-native structure is the same as that of the native structure. The directions of the side chains and the interactions between the side chain atoms at the interfaces were reproduced well in the near-native decoys. In 1MVU, the direction of the helical antigen in the best non-native decoy was opposite to that of the native antigen. Here, two ionic residues of the antigen of the non-native decoy, Glu11 and Arg13, which do not interact with the native antibody structure or the near-native decoy, formed a hydrogen bond and a salt bridge, respectively, with the antibody (Figure 7a). In the case of the best non-native decoys of 1YJD, the antibody binds to the antigen at a different site from the binding site observed in the native structure. Here, the antibody Lchain Tyr94 and H-chain Asn55 of 1YJD, which do not interact in the native structure or the near-native decoy, interact with the antigen (Figure 7b). For the best non-native decoys of 2QQN, L-chain Ser94 and H-chain Tyr57 of the antibody, which do not interact in the native structure or the near-native decoy, interact with the antigen (Figure 7c). Thus, the MD simulations can not only refine complex structures,32 but also the GBSA scores during the MD trajectories could aid in discriminating between a near-native decoy and non-native decoys, although there are still examples where the non-native decoys show GBSA scores that are as good as the near-native decoy scores.
ACS Paragon Plus Environment
19
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 26
Figure 7. X-ray and predicted structures of the three systems: a) 1MVU, b) 1YJD, and c) 2QQN. Red: X-ray structures of the antigens; Green: near-native structures; Blue: best nonnative decoy structures; Brown: X-ray structures of the antibodies.
Conclusion As shown in the above results, we succeeded in improving the prediction accuracy for antibody–antigen complex structures. The method is summarized as follows: (1) generation of decoys of antibody–antigen docking data sets using surFit, and the ranking of the decoys by the original surFit score; (2) rescoring and ranking of the decoys by the GBSA score, calculated using the structural energy of the GB/SA method; (3) executing MD simulations of
ACS Paragon Plus Environment
20
Page 21 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
the several top ranked decoys; and (4) selecting two or three decoys, including a near-native decoy, by analyzing MD trajectories using the average GBSA score. There are two aspects that should be examined in the future. First, we have prepared data sets from crystal structures in this study. It is also possible to predict an antibody–antigen complex structure using the amino acid sequence of the antibody as a query, when this method is combined with an accurate model of the antibody structure.33–35 By generating accurate model structures of antibodies with the aid of generalized ensemble methods such as Multi-canonical MD (McMD)36 or Replica-Exchange MD (REMD),37 we can analyze various immunological phenomena and design artificial antibodies. Second, we can predict the binding free energy between an antibody and an antigen based on the model structure, including induced fitting and folding phenomena for both the antigen and the antibody, by applying the sophisticated Adaptive Umbrella Sampling (AUS) method.38 These steps should facilitate the selection and design of antibodies with much higher performance.
ASSOCIATED CONTENT Supporting Information The data sets for self-docking, scoring results, GBSA score, ∆EGB, and ASA of the binding site are shown in Table S1, and the data sets in ZDOCK benchmark 3.0 and scoring results are listed in Table S2.
AUTHOR INFORMATION Corresponding Author *E-mail:
[email protected] ACS Paragon Plus Environment
21
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 26
Present Address §
N.K.: Graduate School of Simulation Studies, University of Hyogo, 7-1-28 Minatojima
Minami-machi, Chuo-ku, Kobe, Hyogo 650-0047, Japan Notes
The authors declare no competing financial interests.
ACKNOWLEDGMENTS This work was supported by Grants-in-Aid for Scientific Research C (25440070 and JP16K07331) from JSPS to NK. This work was performed in part under the Cooperative Research Program of the Institute for Protein Research, Osaka University (CR-15-05 to NK). NS thanks Dr. Toshihiko Yoshioka and Mr. Akihito Kamei, Panasonic Co. Ltd., for their guidance.
REFERENCES (1) Chen, Y.; Wiesmann, C.; Fuh, G.; Li, B.; Christinger, H. W.; McKay, P.; de Vos, A. M.; Lowman, H. B. Selection and analysis of an optimized anti-VEGF antibody: Crystal structure of an affinity-matured Fab in complex with antigen. J. Mol. Biol. 1999, 293, 865881. (2) Marvin, J. S.; Lowman, H. B. Redesigning an antibody fragment for faster association with its antigen. Biochemistry 2003, 42, 7077-7083. (3) Lippow, S. M.; Wittrup, K. D.; Tidor, B. Computational design of antibody-affinity improvement beyond in vivo maturation. Nat. Biotechnol. 2007, 25, 1171-1176. (4) Kanamori, E.; Murakami, Y.; Tsuchiya, Y.; Standley, D. M.; Nakamura, H.; Kinoshita, K. Docking of protein molecular surfaces with evolutionary trace analysis. Proteins 2007, 69, 832-838. (5) Kanamori, E.; Murakami, Y.; Sarmiento, J.; Liang, S.; Standley, D. M.; Shirota, M.; Kinoshita, K.; Tsuchiya, Y.; Higo, J.; Nakamura, H. Prediction of protein-protein complex structures. In Biomolecular Forms and Functions Bansal, M.; Srinivasan, N., Eds.; IISc Press-WSPC Publication: 2013, pp 160-172. (6) Lyskov, S.; Gray, J. J. The RosettaDock server for local proteinprotein docking. Nucleic. Acids Res. 2008, 36, W233-W238. (7) Mintseris, J.; Pierce, B.; Wiehe, K.; Anderson, R.; Chen, R.; Weng, Z. P. Integrating statistical pair potentials into protein complex prediction. Proteins 2007, 69, 511-520.
ACS Paragon Plus Environment
22
Page 23 of 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
(8) Tsuchiya, Y.; Kanamori, E.; Nakamura, H.; Kinoshita, K. Classification of heterodimer interfaces using docking models and construction of scoring functions for the complex structure prediction. Adv. Appl. Bioinf. Chem. 2009, 2, 79-100. (9) Fleishman, S. J.; Whitehead, T. A.; Strauch, E. M.; Corn, J. E.; Qin, S.; Zhou, H. X.; Mitchell, J. C.; Demerdash, O. N.; Takeda-Shitaka, M.; Terashi, G.; Moal, I. H.; Li, X.; Bates, P. A.; Zacharias, M.; Park, H.; Ko, J. S.; Lee, H.; Seok, C.; Bourquard, T.; Bernauer, J.; Poupon, A.; Aze, J.; Soner, S.; Ovali, S. K.; Ozbek, P.; Tal, N. B.; Haliloglu, T.; Hwang, H.; Vreven, T.; Pierce, B. G.; Weng, Z.; Perez-Cano, L.; Pons, C.; Fernandez-Recio, J.; Jiang, F.; Yang, F.; Gong, X.; Cao, L.; Xu, X.; Liu, B.; Wang, P.; Li, C.; Wang, C.; Robert, C. H.; Guharoy, M.; Liu, S.; Huang, Y.; Li, L.; Guo, D.; Chen, Y.; Xiao, Y.; London, N.; Itzhaki, Z.; Schueler-Furman, O.; Inbar, Y.; Potapov, V.; Cohen, M.; Schreiber, G.; Tsuchiya, Y.; Kanamori, E.; Standley, D. M.; Nakamura, H.; Kinoshita, K.; Driggers, C. M.; Hall, R. G.; Morgan, J. L.; Hsu, V. L.; Zhan, J.; Yang, Y.; Zhou, Y.; Kastritis, P. L.; Bonvin, A. M.; Zhang, W.; Camacho, C. J.; Kilambi, K. P.; Sircar, A.; Gray, J. J.; Ohue, M.; Uchikoga, N.; Matsuzaki, Y.; Ishida, T.; Akiyama, Y.; Khashan, R.; Bush, S.; Fouches, D.; Tropsha, A.; Esquivel-Rodriguez, J.; Kihara, D.; Stranges, P. B.; Jacak, R.; Kuhlman, B.; Huang, S. Y.; Zou, X.; Wodak, S. J.; Janin, J.; Baker, D. Community-wide assessment of protein-interface modeling suggests improvements to design methodology. J. Mol. Biol. 2011, 414, 289-302. (10) Li, C. H.; Ma, X. H.; Chen, W. Z.; Wang, C. X. A protein-protein docking algorithm dependent on the type of complexes. Protein Eng. 2003, 16, 265-269. (11) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Parametrized Models of Aqueous Free Energies of Solvation Based on Pairwise Descreening of Solute Atomic Charges from a Dielectric Medium. J. Phys. Chem. 1996, 100, 19824–19839. (12) Richmond, T. J. Solvent Accessible Surface-Area and Excluded Volume in Proteins Analytical Equations for Overlapping Spheres and Implications for the Hydrophobic Effect. J. Mol. Biol. 1984, 178, 63-89. (13) Kadirvelraj, R.; Gonzalez-Outeirino, J.; Foley, B. L.; Beckham, M. L.; Jennings, H. J.; Foote, S.; Ford, M. G.; Woods, R. J. Understanding the bacterial polysaccharide antigenicity of Streptococcus agalactiae versus Streptococcus pneumoniae. Proc. Natl. Acad. Sci. USA 2006, 103, 8149-8154. (14) Zhang, X. L.; Li, X.; Wang, R. X. Interpretation of the Binding Affinities of PTP1B Inhibitors with the MM-GB/SA Method and the X-Score Scoring Function. J. Chem. Inf. Model. 2009, 49, 1033-1048. (15) Li, Y.; Liu, Z. H.; Wang, R. X. Test MM-PB/SA on True Conformational Ensembles of Protein-Ligand Complexes. J. Chem. Inf. Model. 2010, 50, 1682-1692. (16) Chowdhury, R.; Rasheed, M.; Keidel, D.; Moussalem, M.; Olson, A.; Sanner, M.; Bajaj, C. Protein-Protein Docking with F(2)Dock 2.0 and GB-Rerank. Plos One 2013, 8, e51307. (17) Berman, H.; Henrick, K.; Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 2003, 10, 980-980. (18) MacCallum, R. M.; Martin, A. C. R.; Thornton, J. M. Antibody-antigen interactions: Contact analysis and binding site topography. J. Mol. Biol. 1996, 262, 732-745. (19) Mendez, R.; Leplae, R.; Lensink, M. F.; Wodak, S. J. Assessment of CAPRI predictions in rounds 3-5 shows progress in docking procedures. Proteins 2005, 60, 150-169. (20) Fukunishi, Y.; Mikami, Y.; Nakamura, H. The filling potential method: A method for estimating the free energy surface for protein-ligand docking. J. Phys. Chem. B 2003, 107, 13201-13210. (21) Kinjo, A. R.; Kidera, A.; Nakamura, H.; Nishikawa, K. Physicochemical evaluation of protein folds predicted by threading. Eur. Biophys. J. Biophys. Lett. 2001, 30, 1-10.
ACS Paragon Plus Environment
23
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 24 of 26
(22) Ponder, J. W.; Case, D. A. Force fields for protein simulations. Adv. Protein Chem. 2003, 66, 27-85. (23) Mashimo, T.; Fukunishi, Y.; Kamiya, N.; Takano, Y.; Fukuda, I.; Nakamura, H. Molecular Dynamics Simulations Accelerated by GPU for Biological Macromolecules with a Non-Ewald Scheme for Electrostatic Interactions. J. Chem. Theory. Comput. 2013, 9, 55995609. (24) Fukuda, I.; Yonezawa, Y.; Nakamura, H. Molecular dynamics scheme for precise estimation of electrostatic interaction via zero-dipole summation principle. J. Chem. Phys. 2011, 134, 164107. (25) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577-8593. (26) Fukuda, I.; Kamiya, N.; Yonezawa, Y.; Nakamura, H. Simple and accurate scheme to compute electrostatic interaction: Zero-dipole summation technique for molecular system and application to bulk water. J. Chem. Phys. 2012, 137, 054314. (27) Kamiya, N.; Fukuda, I.; Nakamura, H. Application of zero-dipole summation method to molecular dynamics simulations of a membrane protein system. Chem. Phys. Lett. 2013, 568-569, 26-32. (28) Arakawa, T.; Kamiya, N.; Nakamura, H.; Fukuda, I. Molecular Dynamics Simulations of Double-Stranded DNA in an Explicit Solvent Model with the Zero-Dipole Summation Method. Plos One 2013, 8, e76606. (29) Nishikawa, Y.; Oyama, T.; Kamiya, N.; Kon, T.; Toyoshima, Y. Y.; Nakamura, H.; Kurisu, G. Structure of the entire stalk region of the Dynein motor domain. J. Mol. Biol. 2014, 426, 3232-3245. (30) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical-Integration of Cartesian Equations of Motion of a System with Constraints - Molecular-Dynamics of N-Alkanes. J. Comput. Phys. 1977, 23, 327-341. (31) Nakamura, H. Roles of electrostatic interaction in proteins. Quart. Rev. Biophys. 1996, 29, 1-90. (32) Takemura, K.; Guo, H.; Sakuraba, S.; Matubayasi, N.; Kitao, A. Evaluation of protein-protein docking model structures using all-atom molecular dynamics simulations combined with the solution theory in the energy representation. J. Chem. Phys. 2012, 137, 215105. (33) Yamashita, K.; Ikeda, K.; Amada, K.; Liang, S. D.; Tsuchiya, Y.; Nakamura, H.; Shirai, H.; Standley, D. M. Kotai Antibody Builder: automated high-resolution structural modeling of antibodies. Bioinformatics 2014, 30, 3279-3280. (34) Almagro, J. C.; Teplyakov, A.; Luo, J. Q.; Sweet, R. W.; Kodangattil, S.; HernandezGuzman, F.; Gilliland, G. L. Second Antibody Modeling Assessment (AMA-II). Proteins 2014, 82, 1553-1562. (35) Shirai, H.; Ikeda, K.; Yamashita, K.; Tsuchiya, Y.; Sarmiento, J.; Liang, S. D.; Morokata, T.; Mizuguchi, K.; Higo, J.; Standley, D. M.; Nakamura, H. High-resolution modeling of antibody structures by a combination of bioinformatics, expert knowledge, and molecular simulations. Proteins 2014, 82, 1624-1635. (36) Higo, J.; Ikebe, J.; Kamiya, N.; Nakamura, H. Enhanced and effective conformational sampling of protein molecular systems for their free energy landscapes. Biophys. Rev. 2012, 4, 27-44. (37) Sugita, Y.; Okamoto, Y. Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 1999, 314, 141-151. (38) Higo, J.; Dasgupta, B.; Mashimo, T.; Kasahara, K.; Fukunishi, Y.; Nakamura, H. Virtual-system-coupled adaptive umbrella sampling to compute free-energy landscape for flexible molecular docking. J. Comput. Chem. 2015, 36, 1489-1501.
ACS Paragon Plus Environment
24
Page 25 of 26
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment
25
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 26 of 26
For Table of Contents Use Only Model Building of Antibody– Antigen Complex Structures Using GBSA scores Noriko Shimba, Narutoshi Kamiya, and Haruki Nakamura
ACS Paragon Plus Environment
26