Exhaustive Proteome Mining for Functional MHC-I ... - ACS Publications

Paul-Ehrlich-Institut, Paul-Ehrlich-Str. 51-59, 63225 Langen, Germany. § Charité - Universitätsmedizin Berlin, Department of Dermatology, Venerolog...
0 downloads 0 Views 2MB Size
Letters pubs.acs.org/acschemicalbiology

Exhaustive Proteome Mining for Functional MHC‑I Ligands Christian P. Koch,† Anna M. Perna,† Sabrina Weissmüller,‡ Stefanie Bauer,‡ Max Pillong,† Renato B. Baleeiro,§ Michael Reutlinger,† Gerd Folkers,†,∥ Peter Walden,§ Paul Wrede,# Jan A. Hiss,† Zoe Waibler,‡ and Gisbert Schneider*,† †

Department of Chemistry and Applied Biosciences, Eidgenössische Technische Hochschule (ETH), Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland ‡ Paul-Ehrlich-Institut, Paul-Ehrlich-Str. 51-59, 63225 Langen, Germany § Charité - Universitätsmedizin Berlin, Department of Dermatology, Venerology and Allergology, Charitéplatz 1, 10117 Berlin, Germany ∥ Collegium Helveticum, Schmelzbergstr. 25, 8092 Zürich, Switzerland # Charité - Universitätsmedizin Berlin, Molecular Biology and Bioinformatics, Campus Benjamin Franklin, Arnimallee 22, 14195 Berlin, Germany S Supporting Information *

ABSTRACT: We present the development and application of a new machine-learning approach to exhaustively and reliably identify major histocompatibility complex class I (MHC-I) ligands among all 208 octapeptides and in genome-derived proteomes of Mus musculus, influenza A H3N8, and vesicular stomatitis virus (VSV). Focusing on murine H2Kb, we identified potent octapeptides exhibiting direct MHC-I binding and stabilization on the surface of TAP-deficient RMA-S cells. Computationally identified VSV-derived peptides induced CD8+ T-cell proliferation after VSV-infection of mice. The study demonstrates that high-level machine-learning models provide a unique access to rationally designed peptides and a promising approach toward “reverse vaccinology”.

P

immunological applicability in mouse models. For proof of concept, we focused on the murine H-2Kb allele. Deep7 and cascaded8,9 machine-learning models have been shown to be methods of choice for solving complex pattern recognition tasks.10 The appeal of cascaded models lies in the combination of different views on the data. Typically, several individual models are developed first, which are then combined by a jury. In the simplest case, one can use consensus voting (for classifiers) or averaging (for regression models) to obtain the final prediction score. The resulting jury decision often is more robust than each individual model. A characteristic of deep learning models is their ability to extract higher-order or ’hidden’ features in the data that are not visible when relying on single-layer techniques. We trained a cascaded jury machinelearning classifier to recognize H-2Kb binding (positive examples) and nonbinding peptides (negative examples).11 We utilized newly derived peptide representations and employed different learning schemes, namely, multilayer feedforward networks (multilayer Perceptron, MLP12) and support vector machines (SVM13) (Figure 1a). The first layer of our jury consists of MLPs and SVMs that receive an input amino

olymorphic MHC-I alleles are key actors in the adaptive cellular immune response.1 The corresponding MHC-I proteins are present on nearly all nucleated cells as part of the cellular immune response against intracellular pathogens (e.g., viruses) or cancerous cells, while MHC-II proteins are associated with humoral immune response and reaction to extracellular pathogens. MHC-I proteins bind and transport peptide fragments originating from proteasomal degradation to the cell surface and present them to CD8+ T lymphocytes. Thus, prediction of a peptide’s MHC-I binding and stabilization potential constitutes a key component for the rational design and development of future vaccines for stimulating T cellmediated immune responses, such as for cancer immunology.2,3 This concept of “reverse vaccinology” aims at finding immunogenic peptides in proteome data.4 From the computational perspective, sophisticated pattern recognition algorithms can achieve the identification of novel MHC I-binding peptides. Here, we employed nonlinear machine-learning techniques for classification of known MHC-I binding peptides as training data and the exhaustive prospective analysis of all 208 octapeptide sequences that can be generated with the 20 genetically coded amino acid residues, as well as of the mouse, influenza A H3N8, and VSV proteomes.5,6 The two viral pathogens were selected because of their proven © 2013 American Chemical Society

Received: April 12, 2013 Accepted: June 9, 2013 Published: June 17, 2013 1876

dx.doi.org/10.1021/cb400252t | ACS Chem. Biol. 2013, 8, 1876−1881

ACS Chemical Biology

Letters

to a jury network, which computes the decisive prediction score as a value between zero and one. Scores close to one indicate potential H-2Kb binding peptides. We focused on octapeptides as input to the machine-learning model, because this peptide length represents a lower limit for immunogenicity for this specific allele.14 During model development, numerical peptide representations were exhaustively evaluated in a combinatorial manner. The final jury model constitutes a cross- and externally validated prediction framework aiming at the prediction of allele-specific MHC-I binding octapeptides (Figure S1, Table S1). Despite almost perfect training data classification, we obtained stable cross-validated prediction accuracy for both internal and external test data (median Matthews’ correlation index15 = 0.65 and 0.62; Table S1), which is well within the quality of current state-of-the-art prediction tools for MHC-I prediction.16,17 From the prediction results obtained with individual descriptors, we observed that the descriptors encoding physicochemical properties seem to be the most relevant for the jury decision. After establishing the machine-learning classifier, we applied the prediction model to host (murine) and pathogen (influenza virus and VSV) genome-derived proteomes as well as the complete octapeptide space (25.6 bn peptides). The mouse proteome investigated here contained 42 837 proteins, the influenza A H3N8 (Hokkaido 1980) proteome 11 proteins,5 and VSV proteome six proteins. Approximately 2% of all octapeptides were confidently predicted to bind to H-2Kb, corresponding to 254 008 candidate octapeptides in the mouse proteome, 98 in the influenza virus proteome, and 82 in the VSV proteome (Table 1). Exhaustive analysis of the set of all possible octapeptides revealed approximately 422 million octapeptides with the potential of binding to this single MHC-I allele. Our analysis further showed that, with only few exceptions (seven identical in mouse and influenza; two identical in mouse and VSV; all predicted as nonbinders), all proteome-derived predicted H-2Kb ligands were unique. The known canonical sequence motif of octapeptide H-2Kb binders is defined as x-x-(Y)-x-[Y/F]-x-x-[L,M,I,V] where x denotes any residue and in which residue positions 5 and 8 are referred to as anchor positions, and position 3 as a secondary anchor.18,19 Despite a prevalence of the canonical binding motif among the positive peptides in the training data, the prediction model identified potentially binding peptides without being limited to the canonical motif (Table S2, Supporting Information). It is of particular note that none of the positively predicted influenza peptides fulfills the complete motif and only 27 (12%) of the candidates possess the two main anchors. Among the positively predicted VSV-derived octapeptides, only two exhibit the full motif (0.9 the accuracy of the jury boosted to 84% for all measured data (positive and negative examples). Similarly, with a threshold of predicted IC50 < 10 μM, which accounts for the weakly binding peptides detected by us, NetMHCpan predicted 78% of all measured peptides correctly. From this preliminary external validation, we conclude that the jury machine-learning model performs well within the accuracy levels of contemporary prediction tools and shows sustained reliability despite the observed discrepancy between training and test data prediction accuracy (Table S1, Supporting Information). It is of note that NetMHCpan, as a representative of other machine-learning models for H-2Kb ligand prediction, and our model did not perform identically but was in parts complementary despite both using the IEDB as training data resource. This observation further motivates a jury approach not only for developing new machine-learning models (as in our study) but also for obtaining improved accuracy by combining several predictors. The fact that the existing methods differ in their predictions justifies the development of new techniques.

yielded a prediction score greater than 0.98, while 20 were sampled from the rest of positive prediction space (0.98 ≥ prediction score ≥ 0.5). In addition, we synthesized known positive and negative reference peptides and selected training peptides that turned out as false-positive and false-negative predictions. SC50 values (peptide concentration at half-maximal MHC-I protein stabilization) were determined in a cell-based assay, as described previously.20 The 11 peptides yielding the highest prediction scores exhibited SC50 values below 3 μM with six peptides yielding nanomolar SC50, out of which SSFSFGGF deserves particular notice with an SC50 value of 30 ± 20 nM, which is comparable to the ovalbumin-derived positive reference peptide SIINFEKL (SC50 = 30 ± 10 nM) (Table S3, Supporting Information). This outcome is in perfect agreement with the expected confidence of our prediction model.21 Furthermore, all seven peptides selected from the overlap of the murine and influenza proteome were, in accordance with the predictions, confirmed as nonstabilizing peptides. In total, 49 peptides were measured as H-2Kb stabilizing, of which 30 (61%) exhibit a partial canonical residue motif (one anchor position), and three (6%) peptides feature neither of the two anchors. In contrast, five peptides fulfilling the entire canonical motif did not lead to measurable stabilization in the assay (11% of all negative examples). For benchmarking purposes, scores obtained by the linear matrix-based prediction tool SYFPEITHI were analyzed.19 It turned out that our jury prediction correlates stronger to the cell-based pSC50 data than the SYFPEITHI scores (0.54 τ vs 0.47 τ; Table S4, Supporting Information). Furthermore, the jury predictions show only weak correlation (Kendall’s τ = 1878

dx.doi.org/10.1021/cb400252t | ACS Chem. Biol. 2013, 8, 1876−1881

ACS Chemical Biology

Letters

VSV-N epitope (RGYVYQGL, prediction score = 0.99), we synthesized all 11 confidently positive peptides. All exhibited significant (Welch-test, all p-values 0.29). Most importantly, the best computer-identified VSV peptide VSPPFLSL exhibited the same high potency as the known VSV-N reference epitope. As for the influenza example, these results corroborate our prediction method as reliable and applicable to prospective proteome mining. In order to investigate whether the peptides predicted and identified as strong MHC-I binders also showed immunestimulatory capacity, we analyzed both T cell proliferation and interferon-γ production upon peptide restimulation in a mouse study. We used an experimental model of VSV in vivo infection followed by an ex vivo peptide recall assay with splenocytes. Although our machine-learning model was trained to predict H2Kb binding, we observed that the three VSV octapeptides identified as “best binders” (VSPPFLSL, NSYLYGAL, IGFLYGDL) were able to induce T cell proliferation (∼15% of proliferation as induced by the immunodominant VSV-N peptide; Figure S2, Supporting Information), yet no intracellular IFN-γ was detected (Figure S3, Supporting Information). The close to perfect agreement between our computed predictions and measured SC50 values is of particular practical use for the design of “positive” initiator peptides. The further interaction of costimulatory molecules such as those of the B-7 family with CD28 on T cells is licensing the activation of the T cell, which is promoted by the expression of cytokines such as interleukin-12 and interferons.26,27 To the best of our knowledge, the various concepts of “deep learning”28,29 have yet to be transferred and applied to this complex immunological issue. Our study thereby provides a first step toward this direction and an appreciably more reliable predictor compared to established weight matrix techniques.30−32 The suggested methodology consequently extends experimental high-throughput approaches to epitope discovery to complete proteomes,33,34 thereby providing first-hand access to all candidate sequences and enabling rational vaccine design.

Further expanding the view on canonical motif dissipation to the entire set of peptides tested in this study revealed three peptides with a measurable stabilization effect despite the lack of the classical motif residues (ITFLQALQ, SC50 = 8.8 μM; YGFRHQNS, SC50 = 1.7 μM; IAFPWSIK, SC50 = 7.5 μM). Although these three peptides do not concur with the canonical motif, at least an aromatic residue at secondary anchor position 3, YGFRHQNS, exhibits stabilization potential. Of further note are five peptides (NNDNFDKL, EYYDYEAV, GIPLYDAI, LGGLYEAI, KENRFIEI) that fulfill the entire canonical motif, yet do not exhibit measurable MHC-I stabilization capacity. Apparently, the prediction model extracted subtle ligand features that go beyond the established coarse-grained motif, thereby providing more accurate means for proteome mining and identifying novel MHC-I ligands. In fact, in a recent study, we systematically analyzed the sequence variability of the SIINFEKL epitope using a similar prediction method, and identified the pentapeptide SIINF as a minimal H-2Kb binding fragment, thus qualifying the anchor positions of the canonical residue motif.23 H-2Kb stabilization on the plasma membrane is an indirect measurement of peptide binding. Therefore, as a secondary approach, we established a rapid cell-free thermal denaturation assay measuring direct peptide binding,24 thereby building on earlier work by Bouvier and Wiley,25 who employed circular dichroism spectroscopy to study thermal unfolding of HLA-A2 protein. Due to the fact that peptide-free MHC-I protein is unstable and does not assume native conformation, we used a fusion protein of IgG1 antibody with two MHC-I (H-2Kb) heavy chains and β2-microglobulin to probe for direct binding of stabilizing peptides. The melting curve of the protein complex roughly follows a two-step melting profile (Figure 2a). The first segment (30−60 °C) of the melting curve features an inflection point at approximately 50 °C, which is absent from the melting curve profiles of β2-microglobulin and IgG1antibody alone (Figure 2b, c). We therefore reasoned that the inflection point observed in the first segment for the fusion protein results from denaturation of the peptide-binding superdomain (α1,2) of the MHC-I heavy chain. In support of this interpretation, the positive control peptide SIINFEKL and the selected stabilizing viral peptide ASGFFSRL cause dosedependent shifts to higher temperatures (Figure 2d, f). In the case of SIINFEKL, this shift of the melting curve was not observed for the lowest peptide concentrations up to 1 μM. Nevertheless, the effect induced by the two highest peptide concentrations eliminates the intermediate plateau between the two inflection points at approximately 50 and 76 °C (Figure 2d, upper panel). No such observation was made for the negative control QDNGHDWI (Figure 2e). Based on the criteria of maximum temperature shift and inflection point log(C/M) value for the melting point curves, we qualitatively evaluated the binding potential of ASGFFSRL (Figure 2f, lower panel). Qualitatively assessed direct binding potential by thermal shift measurements fully concurs with the results obtained from the cell-based assay (Table S5, Supporting Information). Accordingly, SIINFEKL displayed the highest binding potential, and influenza virus-derived peptides SSFSFGGF, LSYRVGYL, LSYSTGAL, ASGFFSRL, SINELSNL, ILWILDRL, LPPNFSSL, RSFELKKL, FSVLFDQL, SIVPSGPL, IFLARSAL, and TSFFYRYG were confirmed as potent, direct, and stabilizing ligands of murine H-2Kb. In a second biochemical validation experiment, we tested VSV-derived peptides for MHC-I binding. Omitting the known



METHODS

Peptide Reference Data. The Immune Epitope Database (IEDB, 10/2010) contains 179 715 entries on MHC-I binding assays for MHC allomorphs of over 14 organisms.14 It includes 2961 entries for MHC I-binding assays for the murine H-2Kb, out of which 1358 are octapeptides. Multiple occurrences of peptides or peptides with ambiguous information on H-2Kb binding were removed, yielding a selective core training set of 996 unique octapeptides with high, medium, low, and lack of affinity to MHC-I. Based on the IEDB annotations, affinity data was categorized into a positive (high, medium, low affinity) and a negative (no affinity) class. The core set was split into “internal” and “external” sets with a ratio of 4:1. “Internal” and “external” provide two evaluation perspectives with the former focusing on cross-validation and the latter with the possibility of utilizing an external validation set.35 For stratified 10-fold crossvalidation, the internal set was repeatedly split into training folds and a test fold with a ratio of 9:1 with equally distributed response ratios (ratio of positives to negatives). Cascaded Machine-Learning Model. MLP and SVM classifiers were trained on all descriptor sets respective to the evaluation perspective (training/cross-validation or external validation) in either case totaling to 12 first-stage or “single” classifiers. We evaluated all 1879

dx.doi.org/10.1021/cb400252t | ACS Chem. Biol. 2013, 8, 1876−1881

ACS Chemical Biology

Letters

Notes

models on training data as well as on test data resulting in confusion tables. Matthews’ correlation coefficient (MCC) was chosen to map confusion tables to a single value in the range [−1,1]. All classifiers were implicitly parametrized upon training in a grid-based approach via 5-fold cross-validation, using online back-propagation of errors for MLP and an SMO-type decomposition method for SVM training.36 MLPs were optimized with respect to the number of neurons h ∈ [2,10] in the hidden layer, the learning rate η ∈ [0.1,1.0] and the momentum α ∈ [0.1,1.0], whereas SVMs were parametrized with respect to kernel choice (polynomial, radial basis, sigmoid), cost parameter C ∈ [0.5,3.0], and the kernel-specific parameters degree d ∈ [1,3], γ ∈ [0.01,0.10] and coef0 ∈ [0,6]. The resulting output neuron values from single classifier MLPs and probability estimates from single classifier SVMs were combined as jury training data where every training pattern for the jury MLP has as many dimensions as single classification models have been chosen to be included. All possible combinations of single classification models were built, which resulted in 46 = 4096 combinations for six descriptors and four possibilities of inclusion. The final jury model was retrained using the complete data set. The prediction model was implemented in Java (Oracle, Sun Microsystems) utilizing WEKA v3.63 machine-learning packages and a LIBSVM v3.0 wrapper.37,38 The prediction method is publicly available at http://modlab-cadd.ethz.ch (SLiDER tool, MHC-I version 2010).41 Thermal Denaturation Assay. Thermal denaturation assays were conducted using a StepOnePlus real-time PCR system (Applied Biosystems). MicroAmp optical 96-well reaction plates (Applied Biosystems cat. no. N8010560) were loaded with a total volume of 20 μL reaction mix of the following composition: 10 μL of ligand-free H2Kb/IgG fusion protein solution (protein concentration: 0.5 mg mL−1; BD Biosciences cat. no. 550750) or 10 μL of PBS buffer (pH 7.36) for ligand-only controls; plus 2 μL of the test peptide (in PBS, pH 7.36) in respective concentrations to yield final peptide concentrations of 100 μM, 10 μM, 1 μM, 100 nM, 10 nM, 1 nM; plus 2 μL of 2X SYPRO Orange (SigmaAldrich cat. no. S5692); plus 6 μL (or 8 μL for blank measurements without ligand) of PBS (pH 7.36). The wells were then continuously heated from 25 to 99 °C with a heating rate of 1.5 °C min−1. Fluorescence intensity was measured using the Applied Biosystems ROX preset with respective excitation/emission maxima at 587/607 nm. The resulting data was recorded and analyzed using the StepOne software, v2.2.2 (Applied Biosystems). Melting points were computed by identifying the minimum of the derivative of the melting curve data for the segment relevant to MHC-I heavy chain denaturation. For influenza A H3N8 derived peptides, dose-dependent melting point curves were qualitatively assessed per peptide in respect to the positive reference SIINFEKL. VSV proteome-derived peptides were examined regarding the melting point of the peptide−MHC complex at 100 μM relative to the average melting point of unloaded MHC complexes (0%) and the average melting point of SIINFEKLloaded complexes (100%).



The authors declare the following competing financial interests: G.S. is a consultant to the pharmaceutical industry and a cofounder and shareholder of inSili.com LLC, Zürich, and AlloCyte Pharmaceuticals Ltd, Basel.



ACKNOWLEDGMENTS The authors thank M. Bastian for helpful discussion and S. Haller for technical support. This research was supported by the ETH Zürich and the Swiss National Science Foundation (SNF, Grant No. 205321-134783 to G. Schneider). P. Wrede was supported by the Deutsche Forschungsgemeinschaft (DFG, SFB 852); P. Walden by the BMBF (13N9197 and 13N11455) and EU-FP7 (LEISHDNAVAX). R. B. Baleeiro received support from DAAD, Germany, and FAPESP, Brazil.



(1) Hildemann, W. H. (1977) Specific immunorecognition by histocompatibility markers: The original polymorphic system of immunoreactivity characteristic of all multicellular animals. Immunogenetics 5, 193−202. (2) Sesardic, D. (1993) Synthetic peptide vaccines. J. Med. Microbiol. 39, 241−242. (3) Janeway, C. A. Jr., Travers, P., Walport, M., and Shlomchik, M. J. (2001) Immunobiology: The Immune System in Health and Disease. Garland Science, New York. (4) Rappuoli, R. (2001) Reverse vaccinology, a genome-based approach to vaccine development. Vaccine 19, 2688−1691. (5) Riberdy, J. M., Flynn, K. J., Stech, J., Webster, R. G., Altman, J. D., and Doherty, P. C. (1999) Protection against a lethal avian influenza A virus in a mammalian system. J. Virol. 73, 1453−1459. (6) Roberts, A., Buonocore, L., Price, R., Forman, J., and Rose, J. K. (1999) Attenuated vesicular stomatitis viruses as vaccine vectors. J. Virol. 73, 3723−3732. (7) Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. (2007) Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153−160. (8) Wolpert, D. H. (1991) Stacked generalization. Neural Networks 5, 241−259. (9) Givehchi, A., and Schneider, G. (2005) Multi-space classification for predicting GPCR-ligands. Mol. Diversity 9, 371−383. (10) Ning, X., Walters, M., and Karypisxy, G. (2012) Improved machine learning models for predicting selective compounds. J. Chem. Inf. Model. 52, 38−50. (11) Hiss, J. A., Bredenbeck, A., Losch, F. O., Wrede, P., Walden, P., and Schneider, G. (2007) Design of MHC I stabilizing peptides by agent-based exploration of sequence space. Protein Eng. Des. Sel. 20, 99−108. (12) Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986) Learning representations by back-propagating errors. Nature 323, 533−536. (13) Cortes, C., and Vapnik, V. (1995) Support vector networks. Mach. Learn. 20, 273−297. (14) Peters, B., Sidney, J., Bourne, P., Bui, H. H., Buus, S., Doh, G., Fleri, W., Kronenberg, M., Kubo, R., Lund, O., Nemazee, D., Ponomarenko, J. V., Sathiamurthy, M., Schoenberger, S., Stewart, S., Surko, P., Way, S., Wilson, S., and Sette, A. (2005) The immune epitope database and analysis resource: From vision to blueprint. PLoS Biol. 3, e91. (15) Matthews, B. W. (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442−451. (16) Koch, C. P., Pillong, M., Hiss, J. A., and Schneider, G. (2013) Computational resources for MHC ligand identification. Mol. Inf. 32, 326−336.

ASSOCIATED CONTENT

S Supporting Information *

Amino acid descriptors, cascaded machine-learning model workflow, cross-validated jury model performance, analysis of correlation between final jury predictions, canonical sequence motif analysis of training data, proteome data, peptide synthesis and analytics, MHC-I stabilization assay, mice and viruses, T cell proliferation assay, intracellular IFN-γ staining, flow cytometric analysis, results of cell-based MHC-I stabilization assay, results of direct binding assay, results of infection study. This material is available free of charge via the Internet at http://pubs.acs.org



REFERENCES

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. 1880

dx.doi.org/10.1021/cb400252t | ACS Chem. Biol. 2013, 8, 1876−1881

ACS Chemical Biology

Letters

(17) Zhang, H., Lundegaard, C., and Nielsen, M. (2009) Pan-specific MHC class I predictors: A benchmark of HLA class I pan-specific prediction methods. Bioinformatics 25, 83−89. (18) Rammensee, H. G., Falk, K., and Rötzschke, O. (1993) Peptides naturally presented by MHC class I molecules. Annu. Rev. Immunol. 11, 213−244. (19) Rammensee, H., Bachmann, J., Emmerich, N. P., Bachor, O. A., and Stevanović, S. (1999) SYFPEITHI: Database for MHC ligands and peptide motifs. Immunogenetics 50, 213−219. (20) Udaka, K., Wiesmüller, K. H., Kienle, S., Jung, G., and Walden, P. (1995) Decrypting the structure of major histocompatibility complex class I-restricted cytotoxic T lymphocyte epitopes with complex peptide libraries. J. Exp. Med. 181, 2097−2108. (21) Chryssolouris, G., Lee, M., and Ramsey, A. (1996) Confidence interval prediction for neural network models. IEEE Trans. Neural Networks 7, 229−232. (22) Hoof, I., Peters, B., Sidney, J., Pedersen, L. E., Sette, A., Lund, O., Buus, S., and Nielsen, M. (2009) NetMHCpan, a method for MHC class I binding prediction beyond humans. Immunogenetics 61, 1−13. (23) Koch, C. P., Perna, A. M., Pillong, M., Todoroff, N. K., Wrede, P., Folkers, G., Hiss, J. A., and Schneider, G. (2013) Scrutinizing MHC-I binding peptides and their limits of variation. PLoS Comput. Biol. 9, e1003088. (24) Senisterra, G., Chau, I., and Vedadi, M. (2012) Thermal denaturation assays in chemical biology. Assay Drug Dev. Technol. 10, 128−136. (25) Bouvier, M., and Wiley, D. C. (1994) Importance of peptide amino and carboxyl termini to the stability of MHC class I molecules. Science 265, 398−402. (26) Arens, R., and Schoenberger, S. P. (2010) Plasticity in programming of effector and memory CD8 T-cell formation. Immunol. Rev. 235, 190−205. (27) Sharpe, A. H. (2009) Mechanisms of costimulation. Immunol. Rev. 229, 5−11. (28) Hinton, G. E., Osindero, S., and Teh, Y. W. (2006) A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527− 1554. (29) Bengio, Y. (2007) On the challenge of learning complex functions. Prog. Brain Res. 165, 521−534. (30) De Groot, A. S., and Berzofsky, J. A. (2004) From genome to vaccine-new immunoinformatics tools for vaccine design. Methods 34, 425−428. (31) Mishra, S., and Sinha, S. (2009) Immunoinformatics and modeling perspective of T cell epitope-based cancer immunotherapy: A holistic picture. J. Biomol. Struct. Dyn. 27, 293−306. (32) Iurescia, S., Fioretti, D., Fazio, V. M., and Rinaldi, M. (2012) Epitope-driven DNA vaccine design employing immunoinformatics against B-cell lymphoma: A biotech’s challenge. Biotechnol. Adv. 30, 372−383. (33) Fridman, A., Finnefrock, A. C., Peruzzi, D., Pak, I., La Monica, N., Bagchi, A., Casimiro, D. R., Ciliberto, G., and Aurisicchio, L. (2012) An efficient T-cell epitope discovery strategy using in silico prediction and the iTopia assay platform. Oncoimmunology 1, 1258− 1270. (34) He, Y., Rappuoli, R., De Groot, A. S., and Chen, R. T. (2010) Emerging vaccine informatics. J. Biomed. Biotechnol. 2010, 218590. (35) Tropsha, A. (2010) Best practices for QSAR model development, validation, and exploitation. Mol. Inf. 29, 476−488. (36) Fan, R. E., Chen, P. H., and Lin, C. J. (2005) Working set selection using the second order information for training SVM. J. Mach. Learn. Res. 6, 1889−1918. (37) Hall, M. (2009) The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 10−18. (38) Chang, C. C., and Lin, C. J. (2011) LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27. (39) Agrafiotis, D. K. (2003) Stochastic proximity embedding. J. Comput. Chem. 24, 1215−1221.

(40) Schneider, G., and Wrede, P. (1998) Artificial neural networks for computer-based molecular design. Prog. Biophys. Mol. Biol. 70, 175−222. (41) MODLAB, SLiDER tool, MHC-I Version 2010; ETH: Zürich, 2012; http://modlab-cadd.ethz.ch.

1881

dx.doi.org/10.1021/cb400252t | ACS Chem. Biol. 2013, 8, 1876−1881