Subscriber access provided by University of Newcastle, Australia
Application Note
SAGE: a Fast Computational Tool for Linear Epitope Grafting onto a Foreign Protein Scaffold Riccardo Capelli, Filippo Marchetti, Guido Tiana, and Giorgio Colombo J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.6b00584 • Publication Date (Web): 30 Nov 2016 Downloaded from http://pubs.acs.org on December 19, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
1
SAGE: a Fast Computational Tool for Linear
2
Epitope Grafting onto a Foreign Protein Scaffold
3
Riccardo Capelli1,2, Filippo Marchetti2, Guido Tiana1, Giorgio Colombo2,*
4
1 Center for Complexity & Biosystems and Dipartimento di Fisica, Università degli Studi di
5
Milano and INFN, via Celoria 16, 20133 Milan, Italy
6
2 Istituto di Chimica del Riconoscimento Molecolare, Consiglio Nazionale delle Ricerche, via
7
Mario Bianco 9, 20131 Milan, Italy
8 9
Graphical TOC
10 11
ACS Paragon Plus Environment
1
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 21
1
ABSTRACT
2
Computational design is becoming a driving force of structural vaccinology, whereby protein
3
antigens are engineered to generate new biomolecules with optimized immunological properties.
4
In particular, the design of new proteins that contain multiple, different epitopes can potentially
5
provide novel highly efficient vaccine candidates. In this context, epitope grafting, which entails
6
the transplantation of an antibody recognition motif from one protein onto a different protein
7
scaffold (possibly containing other immunoreactive sequences) holds great promise for the
8
realization of superantigens.
9
Herein, we present SAGE (Strategy for Alignment and Grafting of Epitopes), an automated
10
computational tool for the implantation of immunogenic epitopes onto a given scaffold. It is
11
based on the comparison between the expected secondary structures of the candidates to be
12
grafted with all the secondary structures in the target scaffold. Evaluating the differences both in
13
sequence and in structure between the epitope and the scaffold returns a ranking of most
14
probable molecules containing the new antigenic sequence. We validate this approach
15
identifying the grafting positions obtained in previous works by experimental and computational
16
methods, proving an efficient, flexible and fast tool to perform the initial scanning for epitope
17
grafting. This approach is fully general and may be applied to any target antigen and candidate
18
epitopes with known 3D structures.
19
ACS Paragon Plus Environment
2
Page 3 of 21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
1
2
Introduction
3
The raise in the number of epidemics caused by new pathogens, combined with emerging drug-
4
resistance, make the development of preventive and therapeutic vaccines an urgent necessity1,2.
5
Vaccine discovery has been revolutionized by the impact of genome sequencing technologies.
6
The genomic analysis of pathogens permitted the discovery of novel candidates for protein-based
7
vaccines (antigens) directly from genomic information. This process was named “reverse
8
vaccinology” (RV) to stress the fact that vaccine discovery originated only from sequence
9
information3. Ideally, RV entails the identification of candidates at relatively late stages of the
10
development process, in contrast to the case of live-attenuated organisms. Consequently, one
11
expects a reduction of some of the bottlenecks in vaccine development, and thus a lower risk of
12
failure.
13
The reach of RV was recently expanded by the integration with structural information, which
14
should expectedly provide the molecular information necessary to obtain more effective vaccine
15
candidates. In fact, the structure-based design of antigens (termed Structural Vaccinology, SV)
16
represents a new driving force in vaccinology1.
17
Previously, we developed and tested a computational SV approach which was implemented as a
18
rapid and inexpensive tool for epitope design2-7, and still guarantees top performance, as
19
validated by comparison with in vivo and in vitro immunological tests. The overall goal of this
20
approach was the identification of immunoreactive epitopes recognized by antibodies, starting
21
from the knowledge of 3D structures of the immunogenic antigens. The structural information
22
permits the use of an epitope prediction method called Matrix of Local Coupling Energies
23
(MLCE)8, now implemented in a webserver (http://bioinf.uab.es/BEPPE)9. Using this approach,
ACS Paragon Plus Environment
3
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 21
1
we successfully designed seroreactive peptides for several diseases, including melioidosis10 and
2
cystic fibrosis11.
3
A possible improvement to the direct use of peptidic epitopes consists in implanting them onto a
4
scaffold of interest to improve their stability. Moreover, to maximize the efficacy and the
5
durability of the immune response, multiple epitopes can be inserted into the starting structure.
6
This procedure, termed Epitope grafting, has been widely used on virus-like particles (VLPs)
7
since early 1980s12,13 and was pioneered for immunogenic proteins by the Schief group14-17.
8
Epitope grafting involves the transplantation of a structural/functional motif onto a structurally
9
homologous region of an unrelated protein target, possibly already hosting other reactive
10
sequences. The antigen target protein must display one or more regions that are conformationally
11
compatible with those of the epitope to be grafted. Once the epitope is transplanted, its
12
conformation should be stable in the new context, to support optimal presentation for the binding
13
of the antibody and for processing by the immune system. The availability of an automated tool
14
for the design of multi-epitope antigens is expected to be a relevant support for vaccine
15
development and for their testing in different contexts, from structural biology to immunology.
16
In particular, it permits to rapidly select stable protein constructs containing a foreign antigen
17
among an ensemble of alternative solutions, without resorting to complex and lengthy techniques
18
of structural biology and modeling.
19
Our computational pipeline, called SAGE (Strategy for Alignment and Grafting of Epitopes),
20
was tested analyzing the results of blind linear epitope grafting predictions, and benchmarking
21
against known cases of successful design reported in the literature15-20. It is found that SAGE
22
identifies successfully the best experimentally-validated candidates among its top scoring
23
solutions without any preliminary knowledge-based input.
ACS Paragon Plus Environment
4
Page 5 of 21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
1 2
Workflow
3
The prediction of the grafting position requires the knowledge of the structure of the target, of
4
the structure of the whole immunogenic cognate protein that contains the epitope to be
5
transplanted and of the position of the epitope along the sequence. It consists of 4 phases: 1)
6
Structural/sequence alignment, 2) Secondary structure prediction, 3) Structure scoring, 4)
7
Exposed surface scoring. SAGE is written as a Python 2.7 package that performs the analysis by
8
accessing external programs, such as PSIBLAST and Naccess, via the Internet.
9
Alignment. The program takes the user-defined linear epitope from the original crystallographic
10
structure of the cognate protein and performs 3 different types of alignment onto the target
11
protein. For this step PyMOL21 scripts are used, namely a pure sequence alignment (Pymol script
12
align), a structure-based alignment (Pymol script super) and hybrid alignment (Pymol script
13
CEAlign22). In every alignment run, the script returns the 3 best candidates. Not to obtain a larger
14
set of suboptimal candidates, the alignment is repeated for several cycles, constraining the search
15
to segments of decreasing length which cover exhaustively the protein, as shown in Figure 1. At
16
the end of this first part, SAGE generates a set of FASTA files containing the sequences of all
17
the grafting candidates.
18
--- Figure 1 here ---
19
Secondary structure scoring. Since the secondary structure of the cognate fragment and of the
20
scaffold can change upon grafting, we evaluated the secondary-structure propensities of the two
21
based solely on their sequence, independently of the actual secondary structure they display. For
22
this purpose we applied the s2D method23, which returns a per-residue α/β formation probability
ACS Paragon Plus Environment
5
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 21
1
based on sequence information. Applying the prediction to a scaffold of N amino acids, the /
2
propensities can be seen as two different points a, in a N-dimensional Euclidean space.
3
After the alignment, we obtained M different sequences of N amino acids each. Thus, we can
4
define a distance , , , , = ∑
5
the difference in each of the two types of secondary structure between any two sequences. These
6
distances are calculated between the scaffold and the sequences obtained from the alignment to
7
evaluate how similar the grafting candidates are expected to be to the original structure. The
8
scores on the two types of secondary structures are then merged in a score for secondary-
9
structure scaffold compatibility, defined as
,
,
−
between such points that quantifies
10
=
11
To take into account the similarity between the epitope in its original protein and in the
12
candidates, we also evaluated the similarity in secondary-structure propensity in the epitope-
13
containing segment. If this is composed by L residues, we compare L-dimensional vector ,
14
for the epitope in its cognate protein and two different L-dimensional vectors , for every
15
candidate. Therefore, we define another score which reports the epitope secondary structure
16
compatibility
17
$ =
18
Exposed surface scoring. The structural information available is used to evaluate how grafting
19
affects the exposed surface of the protein. The exposed surface area is measured using Naccess24.
20
For every candidate, SAGE, via the PyMOL mutagenesis tool, creates a new structure mutating
21
the residues of the original scaffold with those of the new epitope using the most probable
22
rotamer. The exposed surface is evaluated for all epitope residues in the cognate protein and for
,!
+
#
.
,!
+ ,!. ,! #
ACS Paragon Plus Environment
6
Page 7 of 21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
1
the corresponding grafted residues on all the putative structures. We obtain a L-dimensional
2
vector a with the reference exposure and, having M candidates, a collection of M L-dimensional
3
vectors for the exposure of the grafted fragment. We can compute again an Euclidean distance
4
, = ∑% − between the per-residue exposed area a measured on the epitope in
5
the cognate protein and b measured on the grafted epitope, and an exposure score is defined as
6
$&' =
7
Selection of the candidates. To restrict the number of viable candidates, we need to define a
8
single score. The main problem with the three scores defined above is that they are
9
incommensurable, being defined on different scales. To make them comparable, every score is
.
,!
10
normalized between 0 and 1 dividing it by the highest score found in the whole candidate pool.
11
A reactive epitope in an immunogenic context has to be exposed to the solvent to be recognized
12
by the immune system. To exploit the structural information given by the exposed surface
13
analysis and to penalize the candidates with a hidden grafted epitope we use the normalized
14
exposed surface score as a weight for the remaining scores (,$ =
$&' ,$ ∙ max! $&' max! ,$
15
Finally, we generate the total score as the average of the scaffold and the epitope similarity
16
reweighted scores defined above 1 ,-, = ( + ($ 2
17
Validation
18
To this end, we compared the grafting candidates obtained by SAGE with the structure shown in
19
previous structural epitope grafting works. It is important to note here that a limited number of
ACS Paragon Plus Environment
7
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 21
1
grafting reports have appeared in the literature, in particular with regards to cases where the final
2
structure of the protein could be crystallized/obtained by homology modelling. We therefore
3
retrieved the most possible available cases with structural information, running our analysis for
4
viral sequences taken from HIV-1, RSV and snake toxin16,17,19,25, in which the authors grafted a
5
relevant linear epitope onto a set of different scaffolds. We also tested SAGE performance for
6
epitope grafting on virus-like particles18,20, although the graft length in one case18 does not
7
correspond to the length of the scaffold deletion. The authors used fragments of different length,
8
ranging from extremely short17,18,20 (